Genetic Programming II
Complex Adaptive Systems
Iotur H. Holland, Christopher Langton, and Stewart W. Wilson, advisors
Adaptation in Natural and Artificinl Systems: An Introductory Analysis with
Applications to Biology, Control, and Artificial Intelligence
Iokur H. Holland
Tbward a Practice of Autonomous Systems: Proceedings ofthe First European
Conference on Artificinl Life
edited by Francisco J. Varela and Paul Bourgine
Gmetic Programming: On the Programming of Computers by Means of Natural
Selection
Iotrr R. Koza
From Animals to Animats 2: Proceedings ofthe Second International Conference
on Simulation of Adaptfue Behnaior
edited by |ean-Arcady Meyeq, Herbert L. Roitblat, and Stewart W. Wilson
Intelligent Behnuior in Animals and Robots
David McFarland and Thomas Bcisser
Adaances in Genetic Programming
edited by Kenneth E. Kinnear,Jr.
Genetic Programming II: Automatic Discoaery of Reusable Programs
Iohr, R. Koza
Also Available:
Genetic Programming: The Moaie
Iottn R. Koza and ]ames P. Rice
Genetic Programming II Videotape: The Next Generation
Iotrr R. Koza and james P. Rice
Genetic Programming II
Automatic Discovery of
Reusable Programs
]otur R. Koza
A Bradford Book
The MIT Press
Cambridge, Massachusetts
London, England
@1994 Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form by any electronic or
mechanical means (including photocopying, recording, or information storage or retrieval)
without permission in writing from the publisher.
Set in Palatino by Proteus Typography, Palo Alto, California.
Printed and bound in the United States of America.
The programs, procedures, and applications presented in this book have been included for
their instructional value. The publisher and the author offer NO WARRANTY OF FITNESS
OR MERCHANTABILITY FOR ANY PARTICULAR PURPOSE and do not accept any
liability with respect to these programs, procedures, and applications.
Library of Congress Cataloging-in-Publication Data
Library of Congress Catalog Card Number 94-76375
r
to my mother and father
CONTENTS
Preface
Acknowledgments
1 Introduction
2 Background on Genetic Algorithms, LISP,
and Genetic Programming
3 HierarchicalProblem-Solving
4 Introduction to Automatically Defined Functions -
The TWo-Boxes Problem
5 Problems that Straddle the Breakeven
Point for Computational Effort
6 Boolean Parity Functions
7 Determining the Architecture of the Program
8 The Lawnmower Problem
9 The Bumblebee Problem
10 The Increasing Benefits of ADFs as Problems are Scaled Up
l'/.. Finding an Impulse Response Function
12 Artificial Ant on the San Mateo Trail
L3 Obstacle-AvoidingRobot
14 The Minesweeper Problem
L5 Automatic Discovery of Detectors for Letter Recognition
!6 Flushes and Four-of-a-Kinds in a Pinochle Deck
t7 Introduction to Biochemistry and Molecular Biology
18 Prediction of Transmembrane Domains in Proteins
19 Prediction of Omega Loops in Proteins
20 Lookahead Version of the Thansmembrane Problem
21 Evolutionary Selection of the Architecture
of the Program
22 Evolution of Primitives and Sufficiency
23 Evolutionary Selection of Terminals
24 Evolution of Closure
25 Simultaneous Evolution of Architecture,
Primitive Functions, Terminals, Sufficiency, and Closure
26 The Role of Representation and the Lens Effect
27 Conclusion
xvii
xix
1
21
45
57
109
t57
20r
225
275
30r
307
349
365
3n
389
417
429
M5
493
505
525
5/3
597
60L
6Ll
619
&3
Appendix A: List of Special Symbols
Appendix B: List of Special Functions
Appendix C: List of Type Fonts
Appendix D: Default Parameters
Appendix E: Computer Implementation
Appendix F: Annotated Bibliography of Genetic Programming
Appendix G: Electronic Mailing List and Public Repository
Bibliography
Index
il7
651
653
555
661,
697
717
719
735
Contents
Detailed Thble of Contents
2.1
2.2
2.3
2.4
1 Introduction
1.1 Overview
3
3.1
3.2
3.3
3.4
3.5
4
Background on Genetic Algorithms, LISR
and Genetic Programming
Background on Genetic Algorithms
Background on LISP
Background on Genetic Programming
Sources of Additional Information
Hierarchical Problem-Solving
Hierarchical Decomposition
Recursive Application and Identical Reuse
Parameterized Reuse and Generalization
Abstraction
SOAR and Explanation-Based Generalization
Introduction to Automatically Defined Functions -
The Two-Boxes Problem
4.1 The Problem
4.2 PreparatorySteps withoutADFs
4.3 ResultswithoutADFs
4.4 The Idea of Subroutines
4.5 The Idea of Automatically Defined Functions
4.6 PreparatorySteps withADFs
4.7 Creation of the kritial Random Population
4.8 Structure-Preserving Crossover and Typing
4.9 ResultswithADFs
4.10 Comparison of the Structural Complexity of the Solutions
4.11 Comparison of Computational Effort
4.12 Summary
5 Problems that Shaddle the Breakeven Point
for Computational Effort
5.1 Sextic versus Quintic Pollmomial
5.1.1 Sextic Polynomial x6 -Zxa + x2
5.1.1.1 Preparatory Steps without ADFs
5.1.1.2 Results withoutADFs
5.1.1.3 Preparatory Steps with ADFs
5.1.1..4 Results with ADFs
5.1.1.5 Comparison with and without ADFs
5.1.2 QuinticPollmomial x5 -2x3 +x
5.1,.2.1, Preparatory Steps without ADFs
Contents
1
7
2\
21.
31,
35
42
109
110
1L1,
111
1L2
1L3
11,4
1r7
118
118
45
45
47
49
51
53
57
57
60
63
67
73
81
84
85
87
98
99
106
5.1.2.2 Results withoutADFs
5.L.2.3 Results withADFs
5.1,.2.4 Comparison with and without ADFs
5.2 The Boolean 6-Symmetry versus S-Symmetry
5.2.1, The Boolean 6-Symmetry Problem
5.2.1.1, Preparatory Steps withoutADFs
5.2.1,.2 Results without ADFs
5.2.L.3 Preparatory Steps with ADFs
5.2.1,.4 Results withADFs
5.2.1.5 Comparison with and withoutADFs
5.2.2 The Boolean S-Symmetry Problem
5.2.2.1 Results withoutADFs
5.2.2.2 Results with ADFs
5.2.2.3 Comparison with and without ADFs
5.3 The Four-Sine versus Three-Sine Problems
5.3.1 The Four-Sine Problem - sin x * sinZx + sin 3x * sin 4x
5.3.1.1 Preparatory Steps without ADFs
5.3.1,.2 Results without ADFs
5.3.1.3 Preparatory Steps with ADFs
5.3.L.4 Results with ADFs
5.3.1,.5 Comparison with and without ADFs
5.3.2 The Three-Sine Problem - sinx t sin 2x r sin3x
5.3.2.1. Results withoutADFs
5.3.2.2 Results with ADFs
5.3.2.3 Comparison with and withoutADFs
5.4 Four Occurrences versus Three Occurrences of a Reusable Constant
5.4.1 Three-TermExpression x / n+xz / n2 +Zrx
5.4.1,.L Preparatory Steps without ADFs
5.4.1..2 Results withoutADFs
5.4.1.3 Preparatory Steps with ADFs
5.4.1.4 Results withADFs
5.4.1.5 Comparison with and without ADFs
5.4.2 TheTWo-TermExpression x ln+*' ln2
5.4.2.1 Results withoutADFs
5.4.2.2 Results with ADFs
5.4.2.3 Comparison with and without ADFs
5.5 Summary
6 Boolean Parity Functions
6.1. The Even-Parity Problem
6.2 Preparatory Steps withoutADFs
6.3 Even-3-Parity withoutADFs
6.4 Even-4-Parity withoutADFs
6.5 Even-S-ParitywithoutADFs
6.6 Even-6-ParitywithoutADFs
6.7 Multiple Function-Defining Branches
6.8 Hierarchical Automatically Defined Functions
6.9 Preparatory StepswithADFs
6.10 Even-3-Parity withADFs
6.71 Even-4-ParitywithADFs
6.12 Even-S-ParitywithADFs
6.13 Even-6-Parity Problem with ADFs
6.1,4 Summary for the Even-3-, 4-,5-, and 6-Parity problems
6.15 Scaling for the Even-3-, 4-,5-, and 6-Parity problems
6.1.6 Higher-Order Even-Parityproblems
Contents
r19
120
120
122
122
123
124
125
129
132
132
132
r32
r34
134
135
135
135
r37
138
1"41
1,42
1,42
144
144
1,M
1.45
1,46
146
147
1.48
150
151
151
1.52
153
L53
r57
157
158
1.61,
L62
162
L&
L66
L67
170
175
178
180
188
189
190
t94
6.16.I Even-7-Parity Problem
6.1.6.2 Even-8-Parity Problem
6.16.3 Even-9-Parity Problem
6.1.6.4 Even-1O-Parity Problem
6.1.6.5 Even-ll-Parity Problem
7 Determining the Architecture of the Program
7.1. Method of Prospective Analysis
7.2 Method of Providing Seemingly Sufficient Capacity
7.3 Method of UsingAffordable Capacity
7.4 Method of Retrospective Analysis
7.4.1. Baseline for the Even-S-Parity Problem without ADFs
7.4.2 One TWo-Argument ADF
7.4.3 One Three-Argument ADF
7.4.4 One Four-Argument ADF
7.4.5 TWo TWo-Argument ADFs
7.4.6 TWo Three-ArgumentADFs
7.4.7 TWo Four-ArgumentADFs
7.4.8 Three TWo-Argument ADFs
7.4.9 Three Three-Argument ADFs
7.4.10 Three Four-ArgumentADFs
7.4.11, Four TWo-Argument ADFs
7.4.12 Four Three-ArgumentADFs
7.4.13 Four Four-ArgumentADFs
7.4.1,4 Five TWo-ArgumentADFs
7.4.15 Five Three-ArgumentADFs
7.4.1,6 Five Four-ArgumentADFs
7.5 Summary of RetrospectiveAnalysis
8 The Lawnmower Problem
8.1 The Problem
8.2 Preparatory Steps withoutADFs
8.3 Lawn Size of 64 withoutADFs
8.4 Lawn Size of 32 withoutADFs
8.5 Lawn Size of 48 without ADFs
8.6 LawnSize of 80withoutADFs
8.7 Lawn Size of 96 withoutADFs
8.8 Preparatory Steps r,r'ithADFs
8.9 Lawn Size of 64 withADFs
8.10 LawnSize of 32 withADFs
8.11 LawnSize of 48 withADFs
8.12 Lawn Size of 80 with ADFs
8.13 Lawn Size of 96 withADFs
8.L4 Summary for Lawn Sizes of 32, 48, 64,80, and 96
8.15 Scaling for Lawn Sizes of 32, 48, &,80, and 96
8.1,6 Wallclock Time for the Lawnmower Problem
g The Bumblebee Problem
9.1. The Problem
9.2 Preparatory Steps withoutADFs
9.3 Results with 25 Flowers withoutADFs
9.4 Preparatory Steps withADFs
9.5 Results with 25 Flowers withADFs
9.6 Results with 20 Flowers without ADFs
9.7 Results with 20 Flowers with ADFs
9.8 Results with 15 Flowers without ADFs
Contents
194
195
L96
197
197
201
202
203
204
204
205
206
206
206
208
208
208
208
210
270
210
212
212
212
212
215
215
225
226
227
228
240
242
242
242
2M
246
257
259
260
260
262
263
268
275
275
275
277
279
280
283
2U
2U
9.9 Results with 15 Flowers withADFs
9.10 Results with 10 Flowers withoutADFs
9.L1, Results with 10 Flowers withADFs
9.12 Summary for L0, LS,20, md21 Flowers
9.13 Scaling with 10, 15,20, and 25 Flowers
9.14 Wallclock time for the Bumblebee problem
10 The Increasing Benefits of ADFs as
Problems are Scaled Up
10.1 The Benefits of ADFs as a Function of problem Size
10.2 Wallclock time
11 Finding an Impulse Response Function
11.1 The Problem
11,.2 Preparatory Steps withoutADFs
11.3 Results of One RunwithoutADFs
11,.4 Results of Series of Runs without ADFs
11.5 Preparatory Steps withADFs
lL.6 Results of One Run withADFs
1L.7 Genealogical Audit Trail with ADFs
11.7.1, Crossover in the Result-producing Branch
11,.7.2 Crossover in the Function-Defining Branch
11.8 Results of Series of Runs with ADFs
119 Summary
12 Artificial Ant on the San Mateo Tlail
12.1, The Problem
12.2 Preparatory Steps withoutADFs
12.3 ResultswithoutADFs
12.4 PreparatorySteps withADFs
12.5 ResultswithADFs
t2.6 Summary
13 Obstacle-AvoidingRobot
L3.1 The Problem
13.2 Preparatory Steps withoutADFs
13.3 Results withoutADFs
13.4 Preparatory Steps withADFs
13.5 ResultswithADFs
13.6 Summary
t4 The Minesweeper Problem
1.4.1. The Problem
1,4.2 PreparatoryStepswithoutADFs 'l.43 ResultswithoutADFs
14.4 PreparatorySteps withADFs
L4.5 ResultswithADFs
14.6 Summary
15 Automatic Discovery of Detectors for Letter Recognition
15.1 The Problem
15.2 Preparatory Steps withoutADFs
L5.3 ResultswithoutADFs
15.4 Preparatory Steps withADFs
15.5 Results withADFs
15.6 Genealogical Audit Trails withADFs
286
286
288
288
290
293
349
349
350
354
355
356
3&
301
301
304
307
307
309
311
320
322
323
335
335
338
3M
u7
355
365
366
368
372
372
376
377
377
377
378
380
381
387
389
390
390
399
399
401,
412
Contents
15.7
15.8
15.9
t6
16.1
1,6.2
16.3
16.4
16.5
L6.6
17
17.1.
17.2
17.3
17.4
17.5
17.6
17.7
17.8
17.9
18
18.1
18.2
18.3
18.4
18.5
Detectors of Different Sizes and Shapes
Translation-Invariant Letter Recognition
Summary
Flushes and Four-of-a-Kinds in a Pinochle Deck
The FLUSH Probiem
Preparatory Steps without ADFs
Results withoutADFs
Preparatory Steps with ADFs
Results withADFs
Flushes and Four-of-a-Kinds
Introduction to Biochemistry and Molecular Biology
Chromosomes and DNA
Role of Proteins
Transcription and Translation
Amino Acids and Protein Structure
Primary Stmcture of Proteins
Secondary Structure of Proteins
Tertiary Structure of Proteins
Quarternary Structure of Proteins
Genetic Algorithms and Molecular Biology
Prediction of Transmembrane Domains in Proteins
Background on Transmembrane Domains in Proteins
The Four Versions of the Transmembrane Problem
The Idea of Settable Variables, Memory and State
The Idea of Restricted Iteration
Preparatory Steps without ADFs
Terminal Set and Function Set
Correlation as the Fitness Measure
18.5.3 Fitness Cases
18.6 Results withoutADFs for the Subset-Creating Version
187 Preparatory Steps with ADFs for the Subset-Creating Version
18.8 Results withADFs for the Subset-Creating Version
18.9 Summary for the Subset-Creating Version
18.10 The Arithmetic-Performing Version
18.11 Summary for the Arithmetic-Performing Version
19 Prediction of Omega Loops in Proteins
L9.1, Background onOmega Loops
19.2 Preparatory Steps withADFs
19.3 Results for the Subset-Creating Version with ADFs
19.4 Results for the Arithmetic-Performing Version with ADFs
19.5 Summary of the Omega-loop problem
20 Lookahead Version of the Transmembrane Problem
20.1 The Problem
20.2 Partial Parsing
203 Preparatory Steps
20.4 Results
21, Evolutionary Selection of the Architecture of the Program
2I.1. Creation of the Irritial Random Population
21.2 Point typing for Structure-Preserving Crossover
21.3 Results for the Even-S-Parity Problem
41.4
415
41.6
417
417
4r7
4r9
423
424
425
18.5.1
r8.5.2
429
429
430
430
432
4U
4U
436
442
442
445
4/:6
452
453
454
456
457
459
462
471,
472
474
488
488
492
493
493
495
500
502
504
505
505
506
511
513
525
527
532
540
Contents
2I.4 Results for the Even-4-Parity problem
21..5 Results for the Even-3-Parity Problem
2L.6 Summary
22 Evolution of Primitives and Sufficiency
22.1 Primitive Defining Branches
22.2 Results for the Even-S-Parity Problem
22.3 Results for the Boolean 6-Multiplexer problem
22.4 Results for a Single Primitive Function
22.4.L Boolean 6-Multiplexer Problem
22.4.2 Even-S-Parity Problem
23 Evolutionary Selection of Terminals
23.1, Preparatory Steps
23.2 Results for the Even-S-Parity Problem
24 Evolution of Closure
24.1, Undefined Values
24.2 Preparatory Steps
24.3 Results for the Even-4-Parity Problem
24.4 Results for the Even-S-Parity Problem
25 Simultaneous Evolution of Architecture,
Primitive Functions, Terminals, Sufficiency, and Closure
25.I Preparatory Steps
25.2 Results for Even-4-Parity Problem
25.3 Results for Even-S-Paritv Problem
25.4 Summary
26 The Role of Representation and the Lens Effect
26.1. Even-3-, 4-,5-, and &Parity Problems
26.1.1, Even-3-Parity Problem
26.1,.2 Even-4-Parity Problem
26.1,.3 Even-S-Parity Problem
26.1.4 Even-6-Parity Problem
26.1,.5 Summary for the Parity Problems
26.2 The Lawnmower Problem
26.2.1 Lawnmower Problem with Lawn Size of 32
26.2.2 Lawnmower Problem with Lawn Size of 48
26.2.3 Lawnmower Problem with Lawn Size of 64
26.2.4 Lawnmower Problem with Lawn Size of 80
26.2.5 Lawnmower Problem with Lawn Size of 96
26.2.6 Summary for the Lawnmower Problem
26.3 The Bumblebee Problem
26.3.1. Bumblebee Problem with 10 Flowers
26.3.2 Bumblebee Problem with 15 Flowers
26.3.3 Bumblebee Problem with 20 Flowers
26.3.4 Bumblebee Problem with 25 Flowers
26.3.5 Summary for Bumblebee Problem
26.4 Obstacle-Avoiding-Robot Problem
265 MinesweeperProblem
26.6 ArtificialAnt Problem
267 DISCUSSION
27 Conclusion
Appendix A: List of Special Symbols
561
572
572
5/5
576
580
592
594
594
595
597
597
598
601.
501
602
603
607
611.
611
67r
612
617
619
621,
621
624
626
627
627
628
628
629
629
632
633
634
634
634
636
636
636
637
637
639
639
640
643
647
xlv Contents
Appendix B: List of Special Functions
Appendix C: List of Fonts
Appendix D: Default Parameters
Appendix E: Computer Implementation
E.1 Problem Specific Code for Boolean Even-S-Parity Problem
8.2 Kernel
Appendix F: Annotated Bibliography of Genetic Programming
F.L Design
F.1.1 Design of Stack Filters and Fitting Chaotic Data
F.2 Pattern Recognition and Classification
F.2.1 Feature Discovery and Image Discrimination
F.2.2 Pattern Recognition using Automatically Defined Features
F.2.3 Upgrading Rules for an OCR System
F.2.4 Prediction of Secondary Structure of Proteins
F.2.5 The Donut Problem
F.2.6 Evolution of a Model for a jetliner
F.3 Robotic Control
F.3.1 Crawling and Walking of a Six-Legged Creature
F.3.2 Evolution of Herding Behavior
F.3.3 Obstacle-Avoiding Behavior
F.3.4 Corridor-Following and the Lens Effect
F.3.5 Control of Autonomous Robots
F.3.6 Evolution of Cooperation among Autonomous Robots
F.3.7 Incorporating Domain Knowledge into Evolution
F.3.8 Monitoring Strategy for Independent Agents
F.3.9 Genetic Planner for Robots
F.3.10 AI Planning Systems
F.4 Neural Networks
F.4.2 Synthesis of Sigma-Pi Neural Networks
F.4.3 New Learning Rules for Neural Networks
F.5 lrduction and Regression
F.5.1 Induction of Regular Languages
F.5.2 Levenberg-Marquardt Regression
F.5.3 Multiple Steady States of a Dynamical System
F.5.4 Inverting and Co-Evolving Randomizers
F.5.5 Adaptive Learning using Structured Genetic Algorithms
F.5.6 Minimum Description Length and Group Method of
Data Handling
F.5.7 Sequencelnduction
F.6 Financial
F.6.1 Horse Race Prediction
F.6.2 Double Auction Market Strategies
F.6.3 C++ Implementation
F.7 Art
F.7.1 Interactive Evolution of Equations of Images
1,.7.1.1, Genetic Art in Virtual Reality
F.7.2 lazzMelodies from Case-Based Reasoning
and Genetic Programming
F.8 Databases
F.8.1 News Story Classification by Dow Jones
F.8.2 Building Queries for Information Retrieval
F.9 Algorithms
651
553
655
661
663
674
697
698
698
698
698
699
699
699
699
700
700
700
700
701,
701,
70L
701,
701,
702
702
702
702
704
704
704
704
704
705
705
705
705
706
70i6
746
706
705
706
706
707
707
707
747
707
707
Contents
F.9.1 Evolution of the schedule for simulated Annealing 707
F.9.2 Sorting Programs T0g
F.10 Naturallanguage 70g
F.10.1 Word Sense Disambiguation T0g
F.10.2 Classification of Swedish Words T0g
F.11 Modules T0g
F.11.1 Module Acquisition and the Genetic Library Builder 70g
F.11.2 Modules and Automatically Defined Functions 71,0
F.11.3 LearningbyAdaptingRepresentations TI1,
F.l2 Programming Methods TIj,
F.12.1' Directed Acyclic Graphs for Representing Populations of Programs 71L
F.12.2 Co-Routine Execution Model 7I1.
F.12.3 Stack-Based Virtual Machine 7Iz
F.13 Variations in Genetic Operations TI2
F.13.1 Context-Preserving Crossover TlZ
F.1.3.2 Brood Selection and Soft Selection 7I2
F.13.3 Implementation in C++ Ztz
F.13.4 Effect of Locality Tl2
F.13.5 Biologically Motivated Representation of Programs 712
F.13.6 Niches 7Ig
F.1.3.7 Recombination and Selection 7lg
F.13.8 Strongly Typ"d Genetic Programming 7"14
F.1,4 Memory, State, and Mental Models 715
F.14.1 Evolution of Indexed Memory 715
F.1,4.2 Map-Making and Map-Using 71,5
F.15 Theoretical Foundations 71.6
F.15.1 Evolution of Evolvability 716
F.15.2 Fitness Landscapes and Difficulty 71.6
F.15.3 Schema in Genetic Programming 71,6
F.15.4 Turing Completeness 716
Appendix G: Electronic Mailing List and Public Repository 717
Bibliography
Index
719
735
Contents
Preface
ORGANIZATION OF THE BOOK
Chapter 1 introduces the eight main points to be made, with section 1.1 providing an overview of the book.
Chapter 2 provides a brief tutorial on the conventional genetic algorithm,
the LISP prografiuning language, genetic programming, and sources of additional information on the entire field of evolutionary computation. (The reader
who is already familiar with these subjects may decide to skip this chapter
entirely.)
Chapter 3 discusses the three-step hierardrical problem-solving process.
Chapter 4Lays the groundwork for all the problems to be described later.
Using a simple problem (the two-boxes problem), section 4.2 illustrates how
genetic programrning without automatically defined functions is applied to
a problem. (This section maybe skippedby a reader who is already familiar
with the process.) Sections 4.4and 4.5 introduce the ideas of subroutines and
automatically defined functions (ADFs). Section 4.6 illustrates the preparatory steps for applying automatically defined functions to a problem. Section
4.8 explains structure-preserving crossover and the branch typit g technique
used throughout the first three-quarters of this book. Section 4.10 explains
how the size (average structural complexity) of the genetically evolved solutions to problems is measured. Section 4.1L explains the methodology used
for measuring the number of fibress evaluations (the computational effort)
required to yield a solution to a problem with a probability of 99%.
Chapters 5 through 25 solve a variety of problems from a variety of fields,
both with and without automatically defined functions.
Sections 6.7 and 5.8 introduce the ideas of multiple automatically defined
functions and hierarchical automatically defined functions.
Chapter 17 introduces certain computational issues in molecular biology.
Section 18.1 introduces transmembrane domains in proteins.
Section 18.3 discusses memory and states in genetically evolvedprograms.
Section 18.4 introduces the idea of restricted iteration in genetic programming.
section 19.1contains background on omega loops in proteins.
Appendix A is a list of the special symbols used in the book.
Appendix B is a list of special functions defined in the book.
Appendix C is a list of type fonts used in the book.
Appendix D contains the default parameters used to control the runs of
genetic programming reported in this book.
Appendix E contains Common LISP computer code for implementing
automatically defined functions.
Appendix F is an arurotated bibliography on genetic programming.
Appendix G contains information on an electronic mailing list, public
respository, and FTP site for genetic pro#amming.
VIDEOTAPE ASSOCIATED WITH THIS BOOK
A color VHS videotape entitled Genetic Programming ll Videotape: The Next
Generationbylohn R. Koza and ]ames P. Rice is available from The MIT Press.
This videotape provides an overview of this book and a visualization of actual computer runs for many of the problems discussed in this book. The
videotape is available in three formats: NTSC (ISBN 0-262-61099-X),PAL (ISBN
0-262-61100-7), and SECAM (ISBN 0-262-6tl0t-5). The videotape may be ordered by mail from The MIT Press, 55 Hayward Street, Cambridge, Massachusetts 02I42US^{ by telephone at677-625-8569 or 800-356{343;by electronic
mail at mitpress-orders@mit . edu; or by FAX at 617-625-9080. In additioru the l992book Genetic Programming: On the Programming of Computers by
Means of Natural Selection by ]ohn R. Koza (ISBN 0-262-IIL70-5) and the 1992
videotape Genetic Programming: The Moaie by John R. Koza and fames P. Rice
(ISBN 0-262-6I084-L for NTSC format, ISBN 0-262-6L087-6 for PAL format,
and ISBN 0-262-61088-4 for SECAM format) are also available from The
MIT Press.
Preface
Acknowledgments
]ames P. Rice of the Knowledge Systems Laboratory at Stanford University
brought his exceptional knowledge in programming LISP machines to the
programming of the problems in this book. hr addition, he created all the
artwork for the figures in this book and made innumerable helpful comments
and suggestions on this book.
Martin A. Keane of Keane Associates in Chicago, Illinois conceived the
impulse response problem and made nurnerous helpful suggestions to improve this book.
Douglas L. Brutlag of the Biochemistry Deparbnent of Stanford University
was helpful in explaining various issues concerning biochemistry and
molecular biology.
Stewart W. Wilson of the Rowland hrstitute for Science in Cambridge, Massachusetts provided continuing encouragement for the work here.
I am indebted for many helpful comments and suggestions made by the
following people conceming various versions of the manuscript:
' David Andre of Canon Research Center of America, Palo Alto, and the Computer Science Deparhnent, Stanford University
' Peter J. Angeline of Loral Federal Systems Company, owego, New York
' Jason Bluming of Enterprise hrtegration Technologies, Palo Alto, Califomia
' Scott Clearwater of Xerox PARC, Palo Alto, California
' Robert I. Collins of USAnimation, hrc., Los Angeles
' Patrik D'haeseleer of LSI Logic, Mountain View, Califomia
. Justin Gray of Alysis Software Corporatiory San Francisco
' Frederic Gruau of the Laboratoire de l'Informatique du Parall6lisme, Ecole
Normale Supdrieure de Lyon in Lyon, France
' Simon Handley of the Computer Science Department, Stanford University
' David A. Hinds of the Department of Cell Biology, Stanford University
. Kent Hoxsey of Haiku, Hawaii
' Hitoshi Iba of the Machine Inference Section of the Electrotechnical Laboratory of laparr
Jan jannink of the Computer Science Deparknen! Stanford University
Christopher Jones of Comerstone Research, Menlo park, California
Chin H. Kim of Rockwell L:rtemational, Downey, Califomia
Kenneth E. Kinne ar, lr. of Adaptive Computing Technology, Boxboro,
Massachusetts
' Tod Klingler of the Section on Medical In-formatics of the Biochemistry
Department of Stanford University
' \'V. B. Langdon of the Computer Science Deparhnent of University College,
London
Martin C. Martin of Camegie Mellon University
Sidney R. Maxwell III of Borland h:rtemational, Scotts Valley, California
Melanie Mitchell of the Santa Fe Lrstitute, Santa Fe, New Mexico
Nils Nilsson of the Computer Science Department, Stanford University
Thomas Ngo of Interval Research, Palo Alto
Howard Oakley, Lrstitute of Naval Medicine, United Kingdom
Tim Perkis of Antelope Engineering, Albany, Califomia.
Iohn Perry of Cadence Design Systems, San jose, Califomia
Craig W. Reynolds of Electronic Arts, San Mateo, California
Justinian Rosca of the Computer Science Department, University of
Rochester
Malcolm Shute of the University of Brighton, England
Eric Siegel of the Computer Science Department, Columbia University
Ierry Tsai of the Department of Cell Biology, Stanford University
Walter Alden Thckett of Hughes Missile Systems
Rao Vemuri of the Department of Applied Science, University of Califomia,
Davis
Iotur R. Koza
Computer Science Department
Stanford University
Stanford, CA 94305 USA
E-MAIL: Koza@Cs.Stanford.Edu
a
a
a
a
Acknowledgments
Genetic Programmi.g II
Introduction
Genetic Programming: On the Programming of Computersby Means of Natural
Selection (hereafter referred to as Genetic Programming) proposed a
possible answer to the following question, attributed to Arthur Samuel in
the 1950s:
How can computers leam to solve problems without being explicitly programmed? hr other words, how can computers be made to do what is needed
to be done, without being told exactly how to do it?
Genetic Programming demonstrated a suqprising and counterintuitive answer
to this question: computers can be programmed by means of nafural selection. Irr particular, Genetic Programming demonstrated, by example and argument, that the domain-independent genetic pro paradigm is capable
of evolving computer programs that solve, or approximately solve, a variety
of problems from a variety of fields.
To accomplish this, genetic programming starts with a primordial ooze of
randomly generated computer programs composed of the available programmatic ingredients, ffid breeds the population using the Darwinian principle
of survival of the fittest and an analog of the naturally occurring genetic
operation of crossover (sexual recombination). Genetic programming combines a robust and efficient problem-solving procedure with powerful and
expressive symbolic representations.
This book extends the results in Genetic Programming to larger and more
difficult problems. It focuses on exploiting the regularities, symmetries,
homogeneities, similarities, pattems, and modularities of problem environments by means of automatically defined functions.
Anautomnticnlly definedfunction (ADF)is a function (i.e., subroutine, procedure, module) that is d5mamically evolved during a run of genetic programming and which may be called by a calling program (e.g., a main program)
that is simultaneously being evolved. Automatically defined functions were
conceived and developed by James P. Rice of the Knowledge Systems Laboratory at stanford university and myself (Koza and Rice r992b).
As will be seery genetic programming with automatically defined ftrnctions may solve regularity-rich problems in a hierarchical way.
Regularities appear in many problem environments.
Designers of microprocessor chips reuse certain standard circuits (cells),
each performing the same elementary function throughout the chip.
' Biologists have observed that many mechanisms for performing certain
functions in living things are reused, in identical or similar form, to perform other functions in the same organism or in other organisms.
' hr designmg a house, architects use certain basic constructions over and
over again in identical or almost identical ways.
' The same techniques are reused at different stations along an assembly line
to weld different parts together.
' Different clerks apply the same procedures of double-entry bookkeeping
to process different streams of transactions.
' Computer programmers invoke a similar process of reuse when they
repeatedly call a subroutine from a calling program.
Complicated systems in the real world typically contain massive amounts
of regularity. Understanding, designing, and constructing large systems
requires, as a practical matte{, the leverage gained from the exploitation of
regularity, modulari$, alrrd symmetry. For example, writing computer programs would be utterly impractical if progranuners had to reinvent, from
scratch, the code for the square root, cosine, array-access, file-handling, and
printing on each separate occasion when they needed those functions.
Similarly, design of a microprocessor chip containing thousands of occurrences of a standard cell would be impractical if the chip designer had to start
from the first principles of electronic design and separately think through the
design of each such cell.
The nafural world abounds with instances where the same strucfure or
behavior recurs in identical or similar form. Cells of living things contain
millions of identical copies of thousands of different function-performing
proteins. Humans contain trillions of such cells, but the entire structure is
specified by chromosomes containing only a few billion bits of information.
The three-dimensional coordinates for each atom, of each protein, of each
copy of a protein, of each cell is not explicitly listed in the chromosomes.
Instead, there is a hierarchical arrangement of structures and substrucfures
and massive reuse of certain basic constructions.
Problems from complex, regularity-rich environments can oftenbe solved
by applying a three-step hierarchical process. This three-step process may be
viewed in a top-down way and a bottom-up way.
In the top-down way of describing the hierarchical problem-solaing process,
one first tries to find a way to decompose a given problem into subproblems.
Second, one tries to solve each of the subproblems. Third, one tries to solve
the original overall problem by using the now-available solutions to the
subproblems. If this process is successful, one ends up with a hierarchical and
modular solution to the problem. The popular technique of dir:ide and conquu
is an example of this three-step problem-solving Process.
Chapter 1"
Decompose Solve original
problem
Figure 1.L Top-down way of viewing the three-step hierarchical problem-solving process.
Figure L.L depicts the top-down way of viewing this three-step hierarchical process. The original overall problem is shown at the left. In the
step labeled "decompose" near the top left of the figure, the original problem is decomposed into three subproblems. In the step labeled "solve subproblems" in the top middle of the figure, the three subproblems are solved.
Finally, in the step labeled "solve original problem" near the top right, the
solutions of the three subproblems are invoked and assembled into a
solution to the overall problem.
In practice, certain subproblems may be difficult enough to warrant a
recursive reinvocation of the entire three-step process in order to solve them.
Computer programmers constantly use this three-step problem-solving
process. hr the terminology of computer programmin& the process starts when
the progranuner analyzes the overall problem and divides it into parts. Second, the prograrnmer writes subprograms (subroutines, procedures, functions) to solve each part of the problem. Third, the programmer writes a calling
program (".9., the main program) that solves the overall problem by calling
the subprograms. The main program assembles the results produced by the
subprograms into a solution to the overall problem.
Sometimes the task to be performed by a subprogram is itself so complex
that the programmer will choose to reapply the entire three-step problemsolving process to that task. hr that event a subprogram might call one or
more sub-subprograms. The subprogram is then written so as to assemble
the solutions to its sub-subprograms and thereby perform its task.
This three-step process may be beneficial in turo ways. The total effort required to decompose the problem into subproblems, solve the subproblems,
and assemble the solutions to the subproblems into a solution of the overall
problem often proves to be less than the effort required to solve the original
problem without the aid of the hierarchical process. hr additiory if the decomposition has been done astutely, the solutions to the subprograrns will often
be reusable many times (either identically or with a slight variation) inbuilditg tp the solution to the overall problem. Reuse may lead to simpler and
smaller (moreparsimonious)solutions. Of course,if abeneficialdecomposition
cannotbe found or there are no opportunities for reuse, the three-step process
is counterproductive.
In the bottom-up way of describing the hierarchical three-step problemsolving process, we first try to discover useful regularities and pattems at the
Solution to
Solution to original problem
Introduction
Identify
regularities
Change
representation
Figure 1.2 Bottom-up way of viewing the hierarchical three-step problem-solving process.
lowest (grven) level of the problem environment. Second, we change the representation of the problem and restate it in terms of its inherent regularities
and pattems, thus creating a new problem. Third, we solve the presumably
more tractable recoded problem. If this process of finding regularities and
recoding is successful, one ends up with a hierardrical solution to the problem.
The recoding of the original problem is a chnnge of representation from the
original representation of the problem to a new representation.
Regularities and pattems are, of course, most useful if they reappear many
times in the problem environment.
Previously non-obvious regularities often become apparent when there is
such a change of representation. Lr practice, the process of discovering a solution to the recoded problem may itself require further discovery of regularities and pattems and additional recoding.
As before, this hierarchical process is considered productive only if the total
effortrequired to identify the regularities, change the representatiory and solve
the new problem is less than the effort required to solve the original problem
without the aid of the three-step process.
Figure 1.2 shows the original representation of a problem, three recoding
rules for changing the representation of the problem, the new representation
of the problem, and a solution to the problem. The step labeled "identify rcgularities" near the top left of the figure identifies three recoding rules that can
be applied to the problem environment. The step labeled "change representation" in the top middle of the figure recodes the original problem using the
three just-discovered recoding rules and creates a new representation of the
problem. Finally, the step labeled "solve" near the top right solves the problem as restated in terms of the new representation.
I believe that the goal of getting computers to solve problems without being
explicitly programmed requires the exploitation of regularities and modularities in a hierarchical way. Large complex problems are generally not solved
by individua\ crafting each minute part of the overall solution. Automatic
programrning seems unlikely to be realized for large problems if each part of
the overall solution to a problem is handled as a unique event that is never to
be seen again. Hierarchical organizatron and reuse seem to be required if
automatic progranming is ever tobe scaled up from small problems to large
problems.
The hierarchical three-step problem-solving process described above
offers an alluring way to gain the leverage that is needed to solve large probChapter 1
New
representation
of the
problem
lems. Howeveq, the question immediately arises as to how can one go about
implementing this process in an automated and domain-indEendent way.
Implementation of the top-down approach to the hierarchicalprocess calls
for answers to the following:
. How does one go about decomposing a problem into subproblems?
Once the subproblems have been identified, how does one solve the
subproblems?
Once the subproblems have been identified and solved, how does one
assemble the solutions of the subproblems into a solution to the original
overall problem?
The bottom-up approadr requires Eu:rswers to these implementation issues:
How does one go about finding regularities at a low level of the problem
environment?
Once the regularities have been identified, how does one recode the original problem into a new problem in terms of these regularities (i.e., how
does one change the representation)?
Once the regularities have been identified and the recoding has been done,
how does one solve the original problem as now framed in terms of the
new representation?
The reader of Genetic Programming wil.l recognize that the discovery of a
solution to a subproblem (i.e., the second step of the top-down approach) can
often be accomplished by means of genetic programming. Indeed, Genetic
Programming demonstrated that a broad range of problems can be solved, or
approximately solved, by genetically breedirg a population of computer programs over a period of many generations.
But what about the other steps of the process? How are they to be performed in an automated and domain-independent way? More important,
even if the individual steps can be performed separately, how are they to be
integrated with one another?
The surprising and counterintuitive result that will be demonstrated in this
book is that, for a variety of problems, aII three steps of the hierarchical problem-solvingprocess canbeperformed, automntically and dynamically, wirhtna
run of genetic programming when automatically defined functions are added
to the toolkit of genetic programming.
The technique of automatically defined ftrnctions enables geneticprogramming to automatically discover usefrrl functional subunits dlmamically duritg u run. The concurrent evolution of functional subunits and calling
programs enables genetic programming to rcalize (in an implicit manner) the
entire three-step hierarchical problem-solving process described above automatically within a run of genetic programming.
Starting from a primordial ooze of randomly generated compositions of
programmatic ingredients, genetic programming with automatically defined
functions simultaneously evolves thefunctional subunits and coadapted callbrtroduction
ing programs by employing the Darwinian principle of survival and reproduction of the fittest and genetic crossover. As in Genetic Programming, programming is done by means of natural selection; the program structure that
solves the problem arises from fitness.
The realization by genetic programming of the three-step hierarchical problem-solving process occurs concurrently, not temporally (as the phrase "three
steps" might suggest). More precisely, one can interpret the results produced
by genetic Progranuning with automatically defined functions as a realization
of the three-step process. Genetic progranuning with automatically defined
functions does not, in fac! explicitly perform any of the three steps (either of
the top-down orbottom-up formulation). That is, there is no explicit decomposition of the original problem into subproblems; there is no separate solution of subproblems; and there is no explicit assembly of solutions to
subproblems into a solution to the overall problem. Similarly, there is no
explicit search or discovery of pattems, no change of representatiory and no
separate solution of any new problem expressed i. *y higher level representation. Instead, hierarchical decomposition and changed representation
are emergent properties that we impute to the results produced by genetic
programming with automatically defined functions.
If it is indeed possible to solve a problem by simultaneously evolving a
calling program and one or more subroutines, the question immediately arises
as to whether this process delivers any benefits in terms of the amount of
computation necessary to discover the solution or in terms of the parsimony
of the evolved solutions.
The evidence, provided by examples and argument in this book, supports
the following eight main points:
Main point 1: Automatically defined functions enable genetic programming to solve a variety of problems in a way that can be interpreted as a
decomposition of a problem into subproblems, a solving of the subproblems,
and an assembly of the solutions to the subproblems into a solution to the
overall problem (or which can altematively be interpreted as a search for
regularities in the problem environment, a change of representation, and a
solving of a higher level problem).
Main point 2: Automatically defined functions discover and exploit the
regularities, sFrunetries,homogeneities, similarities, pattems, andmodularities of the problem environment in ways that are very different from the style
employed by human progranuners.
Main point 3: For a variety of problems, genetic programming requires
less computational effort to solve a problem with automatically defined functions than without them, provided the difficulty of the problem is above a
certain relatively low problem-specific breakeven point for computational
effort.
Main point 4: For a variety of problems, genetic programming usually
yields solutions with smaller overall size (lower average structural complexity) with automatically defined functions than without them, provided the
difficulty of the problem is above a certain problem-specific breakeven point
for average structural complefty.
Chapter 1
Main point 5: For the three problems herein for which a Progression of
several scaled-up versions is studied, the average size of the solutions produced by genetic programming increases as a function of problem size at a
lower rate with automatically defined functions than without them.
Main point 6: For the three problems herein for which a progression of
several scaled-up versions is sfudied, the computational effort increases as a
function of problem size at a lower rate with automatically defined functions
than without them.
Main point 7: For the three problems herein for which a progression of
several scaled-up versions is studied, the benefits in terms of computational
effort and average structural complexlty conferred by automatically defined
functions increase as the problem size is scaled up.
Main point 8: Genetic programming is capable of simultaneously solving
a problem and evolving the ardritecture of the overall program.
1..1 OVERVIEW
The general approach of this book is to produce evidence supporting the eight
main points by solving a number of illustrative problems from various fields,
with and without automatically defined functions.
Chapter 2provides a brief tutorial on the conventional genetic algorithm,
the LlSPprogranuning language, and genetic pro#anrming. Section Z.4itemizes sources of additional information for the field of evolutionary computation. (The reader who is already familiar with these subjects may decide to
skip this chapter.)
Chapter 3 further explains the three-step hierarchical problem-solving
Process.
Chapter 4Iays the groundwork for all the problems to be described later
using a simple illustrative problem. The two-boxes problem presents the
opportunity to define a useful functional subunit and to use that subunit twice
in solving the problem.
Sections 4.2 and4.3 illustrate the successful application of genetic programming without automatically defined functions to solve the two-boxes problem. (This review of genetic programmingmaybeskipped by the reader who
is already familiar with the process.)
Sections 4.4 and 4.5 introduce the ideas of subroutines and automaticallvJ
defined functions
Section 4.6 describes the preparatory steps for applying automatically
defined functions to a problem.
Section 4.7 explains the method of creating the initial random population
with automatically defined ftrnctions.
Section 4.8 explains structure-preserving crossover and the branch Vping
technique used throughout the first 20 chapters of this book.
Section4.9 shows the results with automatically defined functions. This
section shows that it is possible to simultaneously evolve both a functional subunit and a coadapted calling program dynamically during a run
Introduction
in order to solve a problem. In other words, genetic programming works
with automatically defined functions. If automatically defined functions work
at all for this problem, one naturally begins to wonder whether they yield
some economy in terms of the computational burden necessary to solve a
problem.
Section 4.L0 explains the measure of average strucfural complexity, S, used
to measure the size of the solutions produced by genetic programming.
Section 4.I1' explains the methodology used in creating the performance
cnrves for measuring the number of fihess evaluations required to yield a
solution (or satisfactory result) for a problem with a satisfactorily high
probabilif (say 99%). The perfoffnarnce curyes permit calculation of a measure of computational effort,E,for a problem.
One of the reasons why it is desirable to solve a problem using automatically defined functions is to avoid repetitively solving and re-solving identical or similar subproblems. Unfortunately, when the perfonnance of genetic
programming is compared, with and without automatically defined functions, for the two-boxes problem in chapter 4, we are disappointed to find
that genetic programming with automatically defined functions is a distinct
disadvantage both in terms of the number of fitness evaluations required to
yield a solution with 99% probability and the average size of the evolved
solutions. The reason for this disappointing result for this particular problem
appears to be that the two-boxes problem offers the opportunity for only the
barest amount of reuse (ottly one reuse of only one subroutine) and only
the barest amount of reused code within the subroutine (only two
multiplications).
The tide tums in chapter 5. We show there that automatically defined functions can indeed reduce the computational effort required to solve a problem.
This chapter compares a simple version and a scaled-up version of four different problems. The problems illustrate four different dimensions for scaling: the order of a polynomiaf the number of arguments to a Boolean functioru
thenumber of harmonics of a sinusoidalfunction, and thefrequency of use of
n inan algebraic expression.All eightversions are solvedbothwithand without automatically defined functions, thus producing 16 series of runs. Genetic
programming is able to solve all eight versions, both with and without automatically defined functions.
\Atrhenwe analyze the 16 sets of results, we find that automatically defined
functions are disadvantageous as measured by computational effort for the
simpler version of eachproblem,butbecome advantageous for the scaled-up
version of the same problem. The reason appears to be that the simpler versions of the four problems aretoo simple to overcome the overhead associated
with automatically defined functions. There is insufficient regularity in the
simpler versions of the four problems to make automatically defined functions beneficial. hr contrast, the scaled-up version of each problem is sufficiently difficult to benefit (often just slightly) from automatically defined
functions. Each of these four problems apparently straddles a breakeven point
in computational effort.
Chapter 1
Theproblems in the remaining chapters are distinctlyotthebeneficial side
of the breakeven point for computational effort.
Chapter 6 considers the problem of qrmbolic regression of the Boolean evenpartty function with a proglessively increasing number of arguments.
In sections 6.3 through 6.6, a baseline is established for solving the even3-,8,5-, and 6-parity problems without automatically defined functions using
a fixed population size of 16,000.
Section 6.7 introduces the idea of multiple automatically defined functions and section 6.8 introduces the hierarchical version of automatically
defined functions.
The even-3-,4-,5r and 6-panty problems are then solved with automatically defined functions. The substantial symmetty *d modularity of this
problem environment means that there are considerable opportunities for
decomposing the problem into subproblems, solving the subproblems, and
assembling the solutions to the subproblems into a solution of the problem as
a whole. Automatically defined functions prove to be beneficial in terms of
computational effort in solving this progression of problems.
Even though th" even6panty problem without automatically defined frurctions was never solved with a population size of L6,AA0, the advantages of
automatically defined functions enable the even-7-,8-,9-,t0-, and 11-parity
problems to be solved using a population of only 4,000.
Automatically defined functions usually prove to be beneficial in terms of
the parsimony of the solutions produced by genetic programming.
As the even-parity problem is scaled rp from 3, to 4, to 5, and to 6 arguments, the growth in the average slze of the solutions is only about half as
large with automatically defined functions as without them. As the evenparity problem is scaled up, the growth in the computational effort is also
considerably less with automatically defined functions than without them.
hr all of the problems mentioned above, we chose the number of automatically defined functions and the number of arguments that they would each
possess in the overall program. There are a number of practical techniques
that can be used in making these architectural choices. The reader might wonder whether such initial architectural choices are important in determining
whether genetic programming is capable of solvi.g u problem.
In chapter7,we solve the even-S-parity problem using 15 different combinations of the number of automatically defined functions and the number of
arguments. The result is that genetic programming solves the problem
regardless of the choice of architecture. The required computational effort
varies somewhat among the 15 architecfures; however, the computational
effort with automatically defined functions is less for all L5 architectures than
the computational effort without automatically defined functions.
The origin of the illustrative problems presented in this book is worth mentioning. Finding problems suitable for exploring the question of how to discoverand exploitregularities of problemenvironmentsproved tobea ditricult,
but necessary preliminary task to doing the experimental research described
in this book. There are two reasons for this.
Introduction
First, ever?resentconsiderations of available computer timeplayed a dominant role in the selection and formulation of problems. When we talk about
computer time, we are not talking merely about the time required to make
one run of a problem. The general approach of this book is to compare the
average performance in solving a problem, both with and without automatically defined functions. Consequently, a problem is suitable for this book only
if it is solvable within a certain maximum number of generations, both with
and without automatically defined functions. Because genetic programming
is a probabilistic algorithm and not every run is successful within the
allowed maximum number of generations, getting a successful run usually
takes more than one run. Again, because genetic programming is probabilistic, measuring performance requires that multiple, successful runs must be
produced, both with and without automatically defined functions. The controlling constraint is the time required for the multiple, successful runs for
whatever version of a problem proves to be the slowest (whickU in practice,
usually tums out to be when automatically defined functions are not being
used). Runs of problems in this book can be very slow indeed (often requiring several days each). Indeed, the runs documented in this book took about
four years of computer time.
Lr addition, we wanted at least some of the problems in this book to be
scalable along some dimension. Our desire to study scaling experimentally
further increased our requirements for computer time. We needed problems
for which multiple, successful runs/ bothwith arrdwithout automatically defined
functions ,for aprogression of several scaled-up aersions of the problem could be
made within a reasonable amount of computer time. We were only able to
find three problems for which we could make a range of comparisons within
a reasonable total amount of computer time: the even-3-,4-,5-, and 6-partty
problems (chapter 6); the lawnmower problem with lar,vn sizes of 32, 48,64,
80, and 96 (chapter 8); and the bumblebee problem with 10, 15, 20 and 25
flowers (chapter 9).
Of course, there is nothing unusual about the fact that the phenomena under
study are barely detectable with the available instrumentation. Each enhancement in the power of telescopes, microscopes, particle accelerators, and virtually every other scientific instrument has enabled new questions to be
experimentally examined. The new questions are, of course, usually at the
edge of what is detectable by the latest piece of equipment.
A second reason for the difficulty in finding suitable benchmark problems
concerns the scope of recent work in the fields of machine leaming, artificial
intelligence, and neural networks. M*y of the problems inGenetic Programming were benchmark problems that had been the focus of considerable previous research. This is not the case in this book. Only the Boolean parity and
symmetry problems have an extensive history; only a few other problems in
this book have even a modest history (e.g., the artificial ant problem of chapter I2).In most instances, we had to construct suitable problems. The reason
for this is that existing paradigms from the fields of madrine learning, artificial intelligence, and neural networks have generally not proved to be capable
10 Chapter 1
11
of discovering and exploiting regularities and symmetries in the way that
automatically defined functions do. Consequently, researchers in those fields
have usually glven a blind eye to regularity-rich problems. Such problems
have only rarely appeared as benchmark problems in these fields. The seeming exception (the Boolean parity problem) is the exception that proves the
rule. The parity problem usually appears in the literature not because its problem environment is replete with regularities, but because it difficult to leam
(since changing any one input always toggles the output). Published solutions to the parity problem usually do not solve the problem by discovering
and exploiti.g the interesting regularities in this problem environment. hrstead, theparityproblem is typicallyused to show that aparticularparadigm
is powerful enough to overcome the difficulties of the problem and to solve it
(usually without discovering or exploiting very symmetry that makes the
problem interesting to us).
For these reasons, we found it necessary to construct several additional
regularity-richproblems for testing automatically defined functions. The first
of these (the lar,r,mmower problem in chapter 8) was specifically designed to
' be much faster to run than the parity problem (it can be run with a population size of only 1..,000, rather than L6,000 or 4,000),
' have exploitable regularities,
' be hard enough to have interesting hierarchical decompositions,
' have a sufficiently rich and varied function set to enable the problem to be
solved in many distinct$ differentways, using many district programming
styles and motifs
' be on the beneficial siCe of the breakeven point for computational effort,
' be on the beneficial side of the breakeven point for average structural
complexity,
' be scalable in some dimension, and
' be so much faster to solve that we could say, in spite of all of the
uncertainties inherent in measuring wallclock time, that this problem is
clearly on the beneficial side of the breakeven point for wallclock time when
automatically defined functions are used.
hr the lawnmower problem, the goal is to find a program for controllirg
the movement of a lawnmower so that the lawnmower cuts all the grass in a
homogeneous, unobstructed yard.The lawnmower problem is scaled in terms
of the size of the lawn. Lawn sizes of 32, 48, 64,80 and 96 areconsidered.
hr addition to demonstrating scaling, the lawnmower problem of chapter 8
illustrates another interesting aspect of hierarchical computer programming.
hr chapters 4 through7,information is transmitted to the genetically evolved
reusable subprograms solely by means of explicit arguments. The automatically defined functions are usually repeatedly invoked with different
instantiations of these explicit arguments. When transmitted values are
received by * automatically defined function, they are bound to dummy
Introduction
variables (formal parameters) that appear locally inside the function. An
altemative to this explicit transmission of information to a subprogram is the
implicit transmission of informationby means of side effects on the state of a
system. In the lawnmower problem considered in this chapteq, one of the two
automatically defined functions takes no explicit arguments.
Genetic prograrnming is capable of solving the lawnmower problem, both
with and without automatically defined functions for all five sizes of lawn
(sections 8.3 through 8.7 and sections 8.9 through 8.13). Section 8.L4
consolidates the experimental evidence and shows that, for any of the given
lawn sizes, substantially less computational effort is required with automatically defined functions than without them. Moreover, the average size of the
programs that successfully solve the problem is considerably smaller when
using automatically defined functions than when not using them.
Section 8.15 considers the specific numerical amounts by which genetic
programming with automatically defined functions outperforms genetic programming without automatically defined functions. When the problem size
is scaled up from 32, through 48,&,and 80, and eventually to96,the average
size of the programs that successfully solve the lawnmower problem appears
to be a linear function of problem size, both with and without automatically
defined functions. However, the two linear relationships are different. The
average size without automatically defined functions seems to be a substantial linear multiple of the problem size. Howevet the average size of the programs that successfully solve the problem with automatically defined functions
seems to consist of a substantial fixed overhead plus a very small linear multiple of the problem size.
When the problem size is scaled between 32 and 96, the computational
effort required for the lawnmower problem without automatically defined
functions increases at an explosively nonlinear rate. Howeveq, with automatically defined functions, there appears tobe only a linear growthin the required
computational effort.
The above-mentioned measure of computational effort based on the nulnber of fihress evaluations required to solve a problem with a satisfactorily
highprobability is only one possible way to measure the computationalburden associated with a problem-solving algorithm. Section 8.16 shows that
less wallclock time is required with automatically defined functions than without them.
Chapter 9 considers thebumblebee problem. This problem is scaled along
the axis representing the number of flowers that the bee has to visit. The
bumblebee problem provides an example of a problem in the domain of floating-point numbers. Four progressive$ more difficult versions of this problem are run, each with and without automatically defined functions.
Automatically defined functions again prove to be beneficial in terms of the
computational effort required to solve the problem and the average structural complexity of the evolved solutions.
The progression of four bumblebee problems is similar to the progression
of parity problems and lawnmower problems in that the computational effort
12 Chapter 1
grows rapidty with problem size without automatically defined functions,
but appears to grow more slowly with automatically defined functions. SimiIuf,y, the average structural complexity appears to grow more slowly with
automatically detined functions than without them.
The bumblebee problem illustrates another aspect of genetic programming
with automatically defined functions. In the parity problem and the
laummower problem, we were able to understand the genetically evolved
regularities by analy ztngthe solutions evolved by genetic programming. Even
though the bumblebee problem was designed to contain a considerable
amount of exploitable regularity and modularity, we were unable to understand any discovered regularity by looking at either the genetically evolved
program or the trajectory of the bee. Nonetheless, we believe that regularities
exist in the genetically evolved solutions employing automatically defined
functions because the comparative statistics provide indirect evidence of the
discovery and exploitation of some regularity (not necessarily one contemplated by us) in the problem environment.
Chapter 10 shows that for the parity problem, the lawnmower problem,
and the bumblebee problem, the advantages in terms of computational effort
and parsimony conferred by automatically defined functiorrs increase as the
problem size is scaled up. In other words, genetic programming with automatically defined functions is scalable automatic programming for the particular problems and ranges of problem sizes that were studied.
Chapter 11" shows how information cein be transmitted between a calling
program and a subprogram in yet anotherway,namely implicit transmission
through a global variable. The problem is to find the impulse response function for a linear time-invariant system. The fact that the subprograms are realvalued functions of a single variable permits the genetically evolved
automatically defined functions to be visualized graphically. The genealogical audit trails in section 11.7 illustrate the way that crossover works to evolve
improved programs in a population. The impulse-response problem provides
another example of a problem in the domain of floating-point numbers.
The artificial ant problem considered in chapter 12 shows a problem
that can be solved using subprograms with no explicit arguments. In all of
the previous problems, at least some information is transmitted to the
reusable subprograms by means of explicit arguments. The subprograms
in these problems are then repeatedly invoked with different instantiations
of the arguments. In this problem, the state of the system is available to
both the subroutine and the calling program and side-effecting operations
alter the state of a system. Information is transmitted between the subroutine and calling program implicitly by means of the current state of the
system. Since the effect of each side-effecting operation depends on the
current state of the system, the state of the system acts as the implicit
arguments to the operation.
The Boolean even-partty problem (chapters 6 and z), the lawnmower
problem (chapter 8), and the bumblebee problem (chapter 9) all contained
a considerable amount of exploitable regularity. In contrast, the artificial
13 Introduction
ant problem in chapter L2 shows that the amount of regularity required in
the problem environment for automatically defined functions to be beneficial can be very modest (consisting of a common inspecting motion
applied in only fwo directions).
The problem of the obstacle-avoiding robot considered in chapter 13
is similar to the lawnmower problem; howeveq, in this problem obstacles
prohibit the straightforward exploitation of the regularities present in the
problem environment.
The minesweeper problem of chapter L4 is similar to the problems of the
lawnmower and the obstacle-avoiding robot; howevel, in this problem, the
obstacles are lethal. Consideration of the lethality of the obstacles in the
minesweeper problem is so important that it dominates the considerations
required to find a solution to the problem.
Both the problem of the obstacle-avoiding robot and the minesweeper problem demonstrate the benefits of automatically defined functions in an environment that is more complicated and less homogeneous than the lawnmower
problem.
Chapters L5 through 20 present problems that, when solved using automatically defined functions, illustrate the simultaneous discovery of initiallyunknown detectors and a way of combining the just-discovered detectors.
The detectors that are dSmamically discovered during the run of genetic programming are then repeatedly used in solving the problem.
Chapter 15 considers the problem of identiSring the letters I and L on a
6-by4pixel grid. The evolved programs consist of hierarchical combinations
of five local detectors. The five automatically defined functions perform local
sensingof aninepixelsubarea of the overall grid.Themainpart of the overall
program moves the detectors around the overall grid and integrates the local
sensory input provided by the five detectors.
Section L5.6 studies the genealogical audit trail of a solution to this problem and illustrates the way that crossover works to evolve improved programs in a population. In section I5.7, the same problem is solved using a
mixture of differently sized detectors. Section 15.8 considers a translation
invariant version of the problem.
Chapter L6 illustrates the automatic discovery of initially unknown detectors for the problem of deciding whether a five-card hand from a pinocNe
deck is a flush or a four-of-a-kind. Correlation is introduced in sectiont6.2 as
a way to measure the fitness of a predicting program (and further discussed
in subsection 18.5.2). This problem paves the way for the subsequent four
chapters (17 through 20) on computational problems in molecular biology
and biochemistry.
The problems of artalyzing data associated with the growing databases in
the field of molecular biology appear to be an especially promising area in
which to apply genetic programming.
Complex relationships in data from the real world can often only be
expressed by u combination of mathematical operations of different types.
Some of the underlying relationships in empirical data may be simple linear
Chapter L
relationships; others can be captured only with polynomials, rational polynomials, or other classes of functions. Conditional operations may be required
to segment parts of the space from one another and to create altemative disjoint models of the data. Calculations involving iterations and memory may
also be required to recognize the pattems and relationships in empirical data.
hr short, modeling complex empirical data requires the flexibility of computer programs.
Existing methods for pattem recognition, classification, and data mining
usually require that the user cofiunit to the nature of the model before the
modeling process begins. Lr contrast, in genetic programmin& the size and
shape as well as the content of the computer program that models the data is
open to evolution.
I believe that genetic programming with automatically defined functions
is especially well suited to problems of discovering pattems and relationships in empirical data because its expressiveness and flexibility enable it to
find solutions consisting of complex combinations of mathematical operations, conditional operations, iteration, memory, and hierarchical decisionmaking. Moreover, since genetic progamming evolves the size and shape as
well as the content of the computer program that solves the problem, it has
the potential to discover unanticipated relationships in empirical data.
Chapter L7 contains an introduction to some of the major current computational issues in biochemistry and molecular biology. Section l"T.L introduces
chromosomes and DNA. The discussion then tums to the role of proteins in
living things (section 17.2), transcription and translation (section 17.3), and
amino acids and protein structure (section I7.4). The primary secondary, terttary, and quartenary structures of proteins are introduced in sections L7.5,
17.6,17.7, and1^7.8,respectively. Section 17.9 contains references to the Sowing number of recent applications of conventional genetic algorithms to molecular biology and bioche-istty.
Chapter L8 considers the problem of predicting whether protein segments
are transmembrane domains or non-transmembrane areas of a protein. Our
solution to this problem incorporates the automatic discovery of initially
unknown detectors, restricted iteratiory and memory.
Section lS.L contains background on transmembrane domains in proteins.
Section 18.2 defines the set-creating version of the problem of predicting
whether a protein segment is a transmembrane domain or a non-transmembrane area of a protein.
Mathematical calculations typically employ iterations and memory. Section 18.3 discusses settable variables, memory, state, and setting functions in
genetically evolved programs.
Section L8.4 introduces the idea of resfricted iteration in genetic programming. Restricted iterationis apracticalwayof introducingiterationintopopulations of genetically evolved computer programs.
The set-creating version of the transmembrane problem in sections 18.5
through 18.9 illustrates the use of settable variables, memory, state, setting
functions, and restricted iteration.
15 Introduction
The best predicting program evolved by genetic programming for the setcreating version of the kansmembrane problem with automatically defined
functions has a slightly better error rate than four other published results.
This genetica\ evolved program is an instance of an algorithm produced by
an automated technique which is superior to that written by human
investigators.
The above version of the transmembrane problem was motivated by
and patterned after recent work on this problem employing set formation.
However, absent this other work, it would have been more natural to
approach this problem with computer programs composed of the ordinary arithmetic operations of addition, subtraction, multiplication, and
division and ordinary conditional operations. Sections 18.10 and L8.LL
present the arithmetic-performing version of the transmembrane problem.
Agutrr, the predicting program evolved by genetic programming for this second version of the transmembrane problem with automatically defined functionshas a slightlybetter errorrate than the same four otherbenchmarkresults.
Chapter L9 extends the techniques of the transmembrane problem to another
problem of molecular biology. The problem here is to predict whether or not
a given protein segment is an omega loop. Omega loops are an irregular kind
of secondary structure in proteins. Section 19.1 provides background on them.
There is a set-creating version of the problem (section 19.3) and an arithmeticperforming version (section 19.5).
Chapter 20 extends the two versions of the transmembrane problem from
drapter 1"8 to a more difficult version of the problem in which the goal is to
predict whether artindiaidual amino acid lies in a transmembrane domain or
a non-transmembrane area. Apartial parsing of the entire protein sequence is
employed in this version of the problem using a lookahead technique.
Chapters 21 through 25 deal with the evolutionary determination of the
architecture of genetically evolved programs.
Prior to these chapters, whenever we applied genetic programming with
automatically defined functions to a problem, we first determined the number of function-defining branches of the overall program that is to be
evolved and the number of arguments possessed by each function-defining branch. If there was more than one function-defining branch, we also
determined the nature of the hierarchical references (if any) allowed
between the function-defining branches. Four different ways of making
these architectural choices are used (as described in chapter 7): prospective analysis of the nature of the problem, seemingly sufficient capacity,
affordable capacity, and retrospective analysis of the results of actual runs.
Chapter 7 shows that regardless of which of 15 architectures is employed,
genetic programming with automatically defined functions is capable of
solving the even-S-parity problem attd, in addition, that less computational effort is required for all 15 architectures with automatically defined
functions than without them. Nonetheless, the user may, for some problems, be unable or unwilling to use any of these four techniques.
L6 Chapter 1
Chapter 21 shows that the architecture of the overall program can be
evolutionarily selected within a run of genetic programming while the
problem is being solved. In the evolutionary method of determining the
architecture of the overall program, the architecture of the overall program is not prespecified. Instead, the initial random population contains
programs with a variety of architectures. The architecture of the eventual
solution is evolutionarily selected by a competitive fitness-driven Process
that occurs during the run at the same time as the problem is being solved.
Because the population is architecturally diverse, the technique of branch
typing described in section 4.8 would hamstring the crossover operation.
An alternative, called point Wping, is explained in section 21.2. Structurepreserving crossover with point typing permits robust recombination while
simultaneously guaranteeing that architecturally different parents will sire
slmtactically and semantically valid offspring.
Section 2L.3 presents results for the even-S-parity problem using the evolutionary method of determining the architecture of the overall program. Sections 21.4 and 21.5 present results for the even-4- and 3-parity problems,
respectively.
br the previous chapters, the user of genetic programming determined a
sufficient set of primitive functions from which the yet-to-be-evolved programs are composed. Suppose that we did not know what set of primitive
functions is sufficient to solve a problem o{, for some reason, did not want to
make the decision of determining the set of primitive functions for a problem. One approach might be to choose a set of primitive functions from a
large, presumably sufficient superset. Howeve{, suppose we wanted to evolve
a set of primitive functions, rather than merely home in on a subset of primitive functions within a prespecified superset.
Chapter 22 explores the question of whether a sufficient set of primitive
functions (expressed in some elementary way) can be evolutionarily determined during a run at the same time that genetic programming is solving the
problem and selecting the architecture of the overall program. Sectton 22.2
presents results for the even-S-parity problem using the evolutionary method
of determining a sufficient set of primitive functions and selecting the ardritecture of the overall program. Section 22.3 presents results for the Boolean 6-
multiplexer. It is interesting to consider whether only one primitive function
is sufficient for solving a problem. Section 22.4 rcvisits both problems with
the constraint that only one primitive functionbe used.
Lr order to evolve a computer prograln capable of producing the desired
output for a given problem, it is necessary to have access to a set of inputs that
are at least a superset of the inputs necessary to solve the problem (that is, the
terminals mustbe sufficient for the problem). br all the previous chapters, the
user of genetic programming determined a sufficient set of terminals from
which the yet-to-be-evolved programs are composed.
Chapter 23 considers the question of whether it is possible for genetic programming to determine the terminal set of a problem (in the sense of enabling genetic programming to select the inputs of the yet-to-be evolved
17 Introduction
Program from a sufficient superset of available inputs) during a run at the
same time that genetic programming is evolvi.g u sufficient set of primitive
functions, evolutionarily selecting the architecture, and solving the problem.
Section 23.L shows that this is possible for the even-S-parity problem.
Every funcdon in the function sets of all the foregoing problems has satisfied the closure requirement in that it has been able to accept, as its arguments, any value that may possibly be retumed by any function in the function
set that may appear as its arguments and any value that may possibly be
assumed by *y terminal in the terminal set of the problem that may appear
as its arguments.
Chapter 24 considers the question of whether it is possible for genetic programmingtoevolve a set of primitive functions satisfyingthe closurerequirementatthe same time that genetic programming is evolving a sufficient setof
primitive functions, determining the architecture of the overall program, and
solving the problem. Sections2(.I artd2(.Zrespectively show that this is possible for the even-4parity problem and the even-S-parity problem, respectively.
Chapters 21 through 24 demonshated that genetic progranLming is capable
of evolving (selecting), in various separate combinations, the soluCcn to a
problem, the architecture of the overall program, the primitive functions, and
the terminals while satisfying the sufficiency requirement and the closure
requirement.
Chapter 25 pulls the techniques of the chapters 2L through Z4together and
shows that genetic programming can evolve the architecture, primitive functions, sufficiency, terminals, and closure, all at the same time as it solves a
problem. Section 25.1 presents results for the even-4-parity problem. Section
25.2 presents results for the even-S-parity problem.
Chapter 26 explores the role that representation plays in facilitating or
thwarting the solution of problems. Specifically,programswith automatically
defined functions provide a different way of viewing a problem space than
programs without automatically defined functions. To do this, this chapter
revisits various problems fromthisbookin terms of the distribution of values
of fitness for one set of 1,000,000 randomly generated programs with
automatically defined functions and a second set of 1,000,000 randomly generated programs without them. For these problems, there is a difference
between the two distributions in terms of their outliers. Since the generation
of these 1,000,000 programs does not, of course, involve either the Darwinian
operation of reproduction or the genetic operation of crossovel, the difference in distributions is a reflection solely of the way points in the search space
of the problem are represented. The difference is a reflection solely of the
chosen representation scheme. The representation chosen to view the points
in the search space of the problem is a kind of lens through which the system
views its world. It appears that a computer program incorporating automatically defined functions provides a better lens for viewing problems whose
environment is regulaq, symmetric, homogeneous, and modular than does a
Chapter 1
computer program composed of similar ingredients without automatically
defined functions. We call this difference the "lens effect."
The organization and style of this book has been dictated by the fact that
our conclusions depend on experimental evidence. This book does not provide any mathematical proof that genetic programming with automatically
defined functions can alwaysbe successfully used, muchless advantageously
used, to solve all problems of every conceivable type. It does, howeveq, provide empirical evidence to support its observations. The ability of an
independent researcher to replicate the results is therefore crucial. To facilitate replication by other researchers, each chapter has been organized in a
uniform style that clearly identifies the key details of the problem, identifies
the preparatory steps that must be taken to apply genetic programming to
the problem, and presents the results of our actual runs. I believe that sufficient information is provided for each experiment described herein to allow
it to be independently replicated so as to produce substantially similar results
(within the limits inherent it *y process involving probabilistic operations
and subject to minor details of implementation).
The conclusion (chapter 27) recapitulates the eight main points that are
supported by the evidence from the various problems in this book.
Lrtroduction
Background on Genetic Algorithms, LISR and
Genetic Programming
This chapter contains a brief explanation of the conventional genetic
algorithm, a brief introduction to the LISP programming language, a brief
introduction of the basic ideas of genetic programming, and pointers to sources
of additional information about evolutionary computation. The pqpose of
this chapter is to provide background which will make this book a self-contained explanation of genetic programming with automatically defined functions. Genetic Programming contains considerable additional detail on the
subjects of this chapter.
Readers already familiar with genetic programming may decide to skip
this chapter.
2,7 BACKGROUND ON GENETIC ALGORITHMS
Iolur Holland's pioneering book Adaptation in Natural and Artifuial Systems
(1975, t992) showed how the evolutionary process can be used to solve
problems by means of a highly parallel technique that is now called the
gmetic algorithm.
The genetic algorithrn transforrns a population of individual objects, each with
an associated value of fitness, into a new generation of the populatton, using
the Darwinian principle of survival and reproduction of the fittest and
analogs of naturally occurring genetic operations such as crossoT)er (sexual
recombinntion) md mutation.
Each possible point in the search space of a problem is encoded, using a
problem-specific representation sdreme, as a fixed-L^gth character string (i.e.,
as a chromosome) or other mathematical object. The genetic algorithm attempts
to find the best (or at least a very good) solution to the problem by genetically
breeding the population of individuals over a number of genuations.
There are four major preparatory steps required to use the conventional
genetic algorithm on fixed-length draracter strings to solve a problem, name$
determining
(1) the representation scheme,
(2) the fitness measure,
(3) the parameters and variables for controllittg the algorithm, and
(4) a way of designating the result and a criterion for terminating a run.
In the conventional genetic algorithm, the individuals in the population
are usually fixed-length character strings patterned after chromosome strings.
Specification of the representation scheme in the conventional genetic algorithm starts with a determination of the string length L andthe alphabet size
K. Often the alphabet is binary, so K equals 2. The most importantpart of the
representation scheme is the mapping that expresses each possible point in
the seardr space of the problem as a particular fixed-length character string
(i.e., as a chromosome) and each such chromosome as a point in the search
space of the problem.
A precondition for solving a problem with the genetic algorithm is that the
representation scheme satisfy the sufficiency requirement in the sense that it is
capable of expressing a solution to the problem.
Finding a representation scheme that facilitates solution of a problem by
the genetic algorithm often requires considerable insight into the problem
and good judgment.
The evolutionary process is drivenby the fifrress measure. The fituress measure assigns a fitness value to each fixed-length character string that it
encounters in the population. The fitress measure should satisfy the requirement of bengfully defined in the sense that it is capable of evaluating any
fixed-length character string that it encounters in any generation of the population. The nature of the fifiress measure varies with the problem.
The primary parameters for controlling the genetic algorithm are the population size, M,and the maximum number of generations to be run, G. Populations can consist of hundreds, thousands, tens of thousands or more
individuals. There canbe dozens, hundreds, thousands, or more generations
in a run of the genetic algorithm. hr additiory there are a number of secondary
quantitative and qualitative control variables that mustbe specified in order
to run the genetic algorithm (as enumerated in Genetic Programming,
table 27.8).
Each run of the genetic algorithm requires specification of a termination
criterion for deciding when to terminate a run and a method of result designation. The termination uituionfor a run of the genetic algorithm usually consists of either satisfying a problem-specific success predicate or completing a
specified maximum number of generations to be run, G.
The success predicate depends on the nature of the problem and the user's
goal. For example, the success predicate may consist of achieving a result that
exceeds a certain threshold. Sometimes it is possible to recognize a I00o/"-
correct solution to a problem when it is discovered (even though one did not
know the answer before the result was encountered). One frequently used
method of result designation for a run of the genetic algorithm is to designate
the best individual obtained in *y generation of the population during the
run (i.e., thebest-so-far tndividual) as the result of the run. Another method
Chapter 2
involves designating the best individual obtained in the generation on which
the run terminated as the result of the run.
Once the four preparatory steps for setting up the genetic algorithm have
been completed, the genetic algorithm canbe run.
The three steps in executing the genetic algorithm operating on fixed-length
character strings are as follows:
(1) Randomly create an initial population of individual fixed-length character strings.
(2) Iteratively perform the following substeps on the population of strings
until the termination criterion has been satisfied:
(a) Assign a fitness value to each individual in the population using the
fitness measure.
@) Create a new population of strings by applyrng the following three
genetic operations. The genetic operations are applied to individual
string(s) in the population selected with a probabilitybased on fitress
(with reselection allowed).
(i) Reproduce an existing individual string by copying it into the
new population.
(ii) Create two new strings from two existing strings by genetically
recombining substrings using the crossover operation at a
randomly chosen crossover point.
(iii) Create a new string from an existing stringby randomly mutating
the character at one randomly chosen position in the string.
(3) Designate the string that is identified by the method of result designation
(e.g., the best-so-far individual) as the result of the genetic algorithm for
the run. This result may represent a solution (or an aPProximate
solution) to the problem.
The genetic algorithm is a probabilistic algorithm. Probabilistic steps are
involved for creating the initial population, selecting individuals from the
population on which to perform each genetic operation (e.9., reproductiorl
crossover), and choosing a point (i.e., a crossover point or a mutation point)
within the selected individual at which to perform the selected genetic operation. Additional probabilistic steps are ofben involved in measuring fibress.
Thus, anything canhappen and nothing is guaranteed in the genetic algorithm.
In practice, it is usually necessary to make multiple independent runs of
the genetic algorithm in order to obtain a result that the user considers successful for a given problem. Thus, the above three steps are, in practice,
embedded in an outer loop representing separate runs.
Figure 2.L is a flowchart of one possible way of implementing the conventional genetic algorithm. Run is the current run number. N is the maximum
number of runs to be made. The variable Gen refers to the current generation
number. Mis the population size. The index i refers to the cunent individual
in the population. The sum of the probability of reproductton, pr, the probability of crossovet, pc, and the probabilig of mutation, pm,is one.
23 Background on Genetic Algorithms, LISP, and Genetic Programming
Create Initial Random
Population for Run Run := Run + I
'ermination Criteriod
Satisfied for Run?
Designate
Result for Run
Evaluate Fitness of Each
Individual in Pooulation
Gen := Gen + 1
Select Two Individuals
Based on Fitness
Perform Reproduction
Insert Mutant into
Figure 2.1 Flowchart of the conventional genetic algorithm.
The best individual produced by looping over i is the best-of-generation
individual; the best individual produced by looping over Gen is thebest-ofrunindividuaL and the best individual produced by looping over Run is the
best-of-all individual. If there is a tie for any of these classes of best individual,
the single individual that first produced the best result is arbitrarily designated as the best.
The genetic operation of reproduction is based on the Darwinian principle of reproduction and survival of the fittest. In the reproduction operation, an individual is probabilistically selected from the population on
the basis of its fitness (with reselection allowed) and then the individual is
copied, without change, into the next generation of the population. The
24 Chapter 2
Thble 2.L TWo parental strings.
Parent 1 Parent 2
01"1
Thble 2.2 Two crossover fragments.
Crossover fragment 1, Crossover fragment 2
Table 2.3 TWo remainders.
Remainder 1 Remainder 2
Thble 2.4 TWo offspring produced by crossover.
Offspring L Offspring 2
010
selection is done in such a way that the better an individual's fitness, the
more likely it is to be selected.
The genetic operation of crossoT)er allows new individuals to be created and
new points in the search space to be tested. The operation of crossover starts
with two parents independently selected probabilistically from the population on the basis of their fibress (with reselection allowed). As before, the
selection is done in such a way that the better an individual's fifiress, the
more likely it is to be selected. The crossover operation produces two
offspring. Each offspring contains some genetic material from each of
its parents.
Irrdividuals from the population can be selected and, in gmeral, are selected
more than once during a generation to pafttcipate in the operations of reproduction and crossover. hrdeed, the differential rates of survival, reproductiory and participation in genetic operations by more fit individuals is an
essential part of the genetic algorithm.
Thbles 2.1" through2.4 illustrate the crossover operation being applied to
the two parental strings 0LL and LL0 of length L = 3 over an alphabet of size
K=2. Table 2.L shows the two parents.
The crossover operation begins by randomly choosing a number between
L and Z-L using a uniform probability distribution. There arc L-L = 2 interstitial locations lying between the positions of a character string of length L = 3.
In the crossover operatiory one of these interstitial locations (say the second)
110
01- 11,-
-1 --0
1L1
25 Background on Genetic Algorithms, LISR and Genetic Programming
is randomly chosen and becomes the crlssu)er point. Each parent is then split
at this crossover point into a crossover fragment and a remainder. Table 2.2
shows the crossoaer fragments of parents 1, md2.
The part of each parent that remains after the crossover fragment is identified is called theremainder. Table2.3 shows the remainders of parents 'L andZ.
The crossover operation combines crossover fragment L with remainder 2
to create offspring L. Similarly, the crossover operation combines crossover
fragment 2 with remainder 1 to create offspring 2. Table 2.4 shows the two
ffipring.
The two offspring are usually different from their two parents and different from each other. Crossover is a creative operation that produces new
individuals that are composed entirely of genetic material from their two
parents. Intuitively, if a character string represents a somewhat effective
approach to solving a given problem, then some values at some positions of
that character string probably have some merit. More important, some combinations of values situated at two or more positions probably have some
merit when they are present together in the character string. By recombining
randomly chosen parts of somewhat effective character strings, a new character shing that represents an even more fit approach to solving the problem
may be produced.
hr the special case where the two parents selected to participate in crossover are identical, the two offspring willbe identical to each other and identical to their parents, regardless of the crossover point. This incestuous case
occurs frequent$ because of the Darwinian selection of individuals to participate in the reproduction and crossover operations on the basis of their
fitness. Consequently, identical copies of a highly fit individual may come to
dominate a populatton. Premature conaergcnce occurs when an individual
becomes dominant in a population but is not the global optimum of the search
sPace.
Theoperattonof mutationbegjnsbyprobabilisticallyselectinganindividual
from the population on the basis of its fibress. Amutation point along the string
is chosen at random, and the single draracter at that point is randomly changed.
The altered individual is then copied into the next generation of the population. Mutation is potentially useful in restoring genetic diversity that maybe
lost in a populationbecause of premature convergence. Mutation is used very
sparingly in most genetic algorithm work.
In implementing the genetic algorithm on a computer, the reproduction,
crossove4 and mutation operations are performed on copies of the selected
individuals. The selected individuals remain unchanged in the population
until the end of the current generation. More fit individuals generally are
usually reselected many times to participate in the operations.
The Darwinian selection of individuals to participate in the operations of
reproductiory crossovel and mutation on the basis of their fitness is an essential aspect of the genetic algorithm. When an individual is selected on the
basis of its fitness to be copied (with or without mutation) into the next
26 Chapter 2
generation of the population, the effect is that the new generation contains
the characteristics it embodies. These characteristics consist of certain values
at certainpositions of the character string and, more importantly, certain combinations of values situated at two or more positions of the string. When two
individuals are selected onthebasis of their fihress tobe recombined, thenew
generation contains the characteristics of both of these parents.
The probabilistic selection used in the genetic algorithm is an essential
aspect of the algorithm. The genetic algorithm allocates every individual, however poor its fitness, some chance of being selected to participate in the
operations of reproduction, crossover, and mutation. That is, the genetic
algorithm is not merely a greedy hillclimbing algorithm. Instead, the genetic
algorithmresembles simulated annealing (Kirkpatrick,Gelatt, andVecchi 1983;
Aarts and Korstl989;van Laarhoven and Aarts 198n in that individuals that
are known to be inferior are occasionally selected. hr fact, simulated annealing resembles a genetic algorithm with a population size, M, of 1.
The fact that the genetic algorithm operates on a population of individuals,
rather than a single point in the search space of the problem, is an essential
aspect of the algorithm. The advantage conferred by the existence of a population is not merely the obvious benefit of dropping L,000 parachutists, rather
than one, onto the fitress landscape. The population serves as the reservoir of
the probably-valuable genetic material that the crossover operation needs to
create new individuals with probably-valuable new combinations of characteristics.
The genetic algorithm works in a domain-independent way on the fixedlength character strings in the population. The genetic algorithm searches the
space of possible character strings in an attempt to find high-fitress strings.
The space may be highly nonlinear and its fitness landscape may be very
rugged. To guide this search, the genetic algorithm uses only the numerical
fitness values associated with the explicitly tested strings. Regardless of the
particular problem domain, the genetic algorithm carries out its search by
performing the same disarmingly simple operations of copying, recombining, and occasionally randomly mutating the strings.
Lr practice, the genetic algorithm is surprisirgly rapid in effectively searching complea highly nonlineaq, multidimensional search spaces. This is all the
more surprising because the genetic algorithm does not have any knowledge
about the problem domain except for the information indirectly provided by
the fitness measure.
Genetic algorithms superficially seem to process only the particular individual character skings actually present in the current generation of the population. Howeveq, Adaptation in Natural nnd Artificial Systems (Holland 1975,
1992) focused attention on the remarkable fact that the genetic algorithm
implicitly processes, in parallel, a large amount of useful information conceming unseen Boolean hypelplanes (schemata). A schemn (plural: schemata)
is a set of points from the search space of a problem with certain specified
similarities. A schema is described by u string over an extended alphabet
Background on Genetic Algorithms, LISP, and Genetic Programming
consisting of the alphabet of the representation scheme (e.g., 0 and 1 if the
alphabet is binary) and a don't care symbol (denoted by an asterisk).
The genetic algorithm creates individual strings in the new generation of
the population in such away that each schema can be expected to be automatically represented in proportion to the ratio of its schemn fitness (i.e., the
average of the fitress of all the points from the search space that are contained
in the schema) to the aaeragepopulationfitne-ss (i.e., the average of the fituress of
all the points from the search space that are contained in the population).
An important conclusiontn Adaptation inNatural and Artificinl System.s (HolIandL975,1992) is that the growth rate for each schema in the genetic algorithm
is an approximately optimal use of the available information in maximizing
the payoff from the genetic algorithm over a period of generations.
The success of the genetic algorithm in solving problems also arises from
the creative role of the crossover operation. lrdeed, a once-controversial point
rn Adaptation in Natural and Artificinl Systems (Holland 1975, L992) concems
the preeminence of the crossover operation and the relative unimportance of
mutation in the evolutionary process in nature and in solving artificial problems of adaptation using the genetic algorithm. The genetic algorithm relies
primarily on crossover. The role of mutation is comparatively insignificant.
Figure 2.2presents a geometric intelpretation of the crossover operation as
applied to the same illustrative problem for which L=3 andK -2. It shows
the parental strings 011 and 110 that produce the string L11 as one of their
offspring. Each point in the search space is represented by a chromosome
string of length L over the binary alphabet. The 2t= 8 vertices of a hypercube
of dimensionality L = 3 represent the points in the search space of the problem. The population of chromosomes is a subset of the vertices of the
hypercube. The two parents 0LL and 110 participati.g in the crossover are
points in the search space of the problem and are thus represented by two
vertices of the hypercube. The offspring 111 produced by the crossover of OLL
and L10 is represented as another vertex of the hypercube. All three of these
individuals are shown in the figure as solid black circles.
Crossover fragment LL- may be thought of as the set containing all the
strings of length L from the search space that have L in their first position,
have L in their second position, and have either 0 or f. in their third position (i.e., "don't care" about the third position). In other words, the crossover fragment 11- can be viewed as the associated schema 1L*. Schemata
are explained in detail rn Genetic Programming (section 3.2), although a
detailed understanding of schemata is not necessary to follow the argument being made here. The schema 1L* is the set of strings of length 3
from the search space that have a L in their first positions and a L in their
second positions. The * in the third position of schema LL* indicates that
we don't care what symbol (0 or 1) is in that position of the strings. Thus,
this schema (set) has two members, namely the points 110 and 111 from
the search space of the problem. The geometric interpretation of this set of
two points is the straight line (hyperplane of dimensionality L) along the
28 Chapter 2
110
Line
11* 111
ot-9*...t:*lll**l-lo** Plane
881
OOO (I,I
Figure 2.2 Geometric interpretation of the crossover operation recombining parents OLL and
110 to produce 1LL as an offspring.
top of the hypercube. one of the points in the schema, namely LL0, is necesiarily one of the parents participating in the crossover.
Simiiarly, the remainder --1 may be viewed in terms of its associated
schema **1.. The schema **l- contains all strings that have either 0 0r 1 in their
first position (i.e., "don't cate" about the first position), either 0 or 1 in their
,..ord position (i.e., "don'fcate" about the second position), and have a L in
their tnira position. The remainder --L may be viewed as the schema (set) **L
containing Utu four members 00L, !01',0I1,and 111' The geometric interpretation of this set of four points is the plane (hyperplane of dimensionality 2) on
the right of the hypercube incorporating the four points 001, 101,011' and 111'
As before, one of tn points, namely 011, is necessarily one of the parents
participati.g io the crossover.
The important feature of the crossover operation is that the offspring 11L
producedty the crossover operation lies at the intersection of the two schemata
lsets). Specificat$, the offspring 111 is at the intersection of the straight line
,"pr"r".ted by the schema l,L* and the plane represented by the schema **1'
Eachofthe2rpointsinthesearchSPaceofaproblem(i.e.,eachvertexof
the hypercube of dimensionality t) belongs to2t sub-hyperplanes (schema)
of dimensionality between 0 and L. For example, when L = 3, each vertex of
the hyperc.rbe oi dimensionality 3 belongs to2r = 8 hype{Plane.s of dimensionaiity between 0 and 3. Specifically, each vertex belongs to one hyperplane
of dimensionality 0 (i.e., the point itself), three straight lines (i.e., hypelplanes
of dimensionality L), three pl*"t (i.e., hyperplanes of dimensionality 2), and
one hypercube of dimensionality 3 (i.e., the whole search space)'
When a particular point in the search space is observed to have a certain
fihress value, this observed fitness can serve as an estimate of the fihress of all
of the 2r sub-h)perplanes to which the particular pointbelongs. In other words,
the fihress of a single point can be attributed to each of the 2z sub-hyPer29 Background on Genetic Algorithms, LISP, and Genetic Programming
Figure 2.3 Geometric interpretation of the mutation operation operating on parent 011 to produce 001,0L0, or 11L as an offspring.
planes to which the point belongs. This estimate is admittedly rough and
sometimes incorrect. Indeed, the correct fitness of a sub-hyperplane of
dimensionalityT < L is the average of the fitness values for all2points in
the sub-hypeqplane. Lr practice, the poptrlation siz e, M, employed in the genetic
algorithm is very small in relation to the 2r points in the search space and is
also very small in relation to the 2 points in a hyperplane of dimensionalityT
(for all but the smallest values of 7). Consequently, there are usually only a
few members of the population (ooly one member in this example) from which
to estimate the hypelplane fihress. Nonetheless, if only this small nrtrrtber, M,
of points from the search space have been explicitly measured for fitress, this
admittedly rough and sometimes-incorrect estimate of the hypeqplane fitness is the best available estimate.
The two parents are selected to participate in the crossover operation on
the basis of their fitress. In practice, this usually means that both parents
have relatively high fihress. If we athibute the fitress of the two observed
parental points to all the points in the straight line l,L* and to all the points in
the plane **1, we see that the offspring point l"LL at the intersection of this
straight line and this plane **L shares two independent estimates that it has
relatively high fitness. In other words, when the crossover operation creates a
new offspring individual, there are two independentpieces of evidence,both
admittedly rough and sometimes incorrect suggesting that the new individual
may have relatively high fitness. Thus, the crossover operation directs the
future search by the genetic algorithm into areas of the overall search space
that tend to have higher and higher fihress.
In contrast, when the mutation operation is applied to a single individual in the population selected on the basis of fitness, the newly created
mutant is a point at the end of one of the straight lines (hyperplanes of
dimensionality 1,) radiating away from the single parental individual. The
mutant lies in various schemata (a line, two planes, and the entire search
space) to which the single individual belongs; however, the only one piece
30 Chapter 2
of evidence suggesting that the mutant has relatively high fitness is the
original selection of the single parent.
Figure 2.3 presents a geometric inteqpretation of the mutation operation
operating on the parental string 0LL. The three points in the search space at a
Hamming distance of L (i.e., 010,Lll,or00L) are the offspring thatmaypotentially be produced by the mutation operation. The parental string and the
three potential offspring are all shown as solid black circles.
The fact that there is independent corroborating evidence in favor of the
offspring produced by crossover is one reason that crossover is more important than mutation in driving the genetic algorithm toward the successful
discovery of a global optimum point in the search space.
2.2 BACKGROUND ON LISP
Any computer program - whether it is written in FORTRAN, Pascal, C, C++,
assembly code, or any other programming language - can be viewed as a
sequence of applications of functions (operations) to arguments (values).
Compilers use this fact by first intemally translating a given program into
a parse tree and then converting the parse tree into the more elementary
machine code instructions that actually flln on the computer. However this
important commonality underlying all computer programs is obscured by
the largevanety of different $rpes of statements, operations, instructions, s)mtactic constructions, and grammatical restrictions found in most programming languages.
Genetic pro$amming is most easily understood if one thinks about it in
terms of a programming language that overtly and hansparently views a
computer program as a sequence of applications of functions to arguments.
Moreove{, since genetic programming initially creates computer progtams
at random and then manipulates the programs by various genetically motivated operations, genetic programming may be implemented in a conceptually straightforward way in a programming language that permits a computer
program tobe easilymanipulated as data and thenpermits the newly created
data to be immediately executed as a program.
For these two reasons, the LISP (LISI Processing) programming language
is especially well suited for genetic programming. Howeveq, it should be recognized that genetic programming does not require LISP for its implementation and is not i^ *y way based on LISP.
For the pu{pose of this discussiory we can view LISP as having only two
types of entities: atoms and lists. The constant 7 and the variable TrME are
examples of atoms in LISP. Alist in LISP is written as an ordered collection of
itemsinsideapairof parentheses. (A B C D) and (+ 1 2) areexamplesof
lists in LISP.
Both lists and atoms in LISP are called symbolic erpressions (S-expressions).
The $expression is the only syntactic form in pure LISP. There is no syrtactic
distinctionbetweenprograms and data in LISP. Inparticulat, all data in LISP
are $expressions and all programs in LISP are S-expressions.
31 Background on Genetic Algorithms,LISP, and Genetic Programming
The LISP system works by evaluating (execrfug) whatever it sees. Il/hen
seen by LISP, a constant atom, such as 7, evaluates to itself, and a variable
atom, such as T r ME, evaluates to the current value of the variable. When LISP
sees a list, the list is evaluated by treating the first element of the list (i.e.,
whatever is just inside the opening parenthesis) as a function. The function is
then applied to the results of evaluating the remaining elements of the list.
That is, the remaining elements of the list are treated as arguments to the
function. If an argument is a constant atom or a variable atom, this evaluation
is immediate; howeveg if an argument is a list, the evaluation of sudr an
argument involves a recursive application of the above steps.
For example, in the LISP S-expression ( + L 2) ,
thte addition function +
appears just inside the opening parenthesis. The S-expression (+ 1 2\
calls for the application of the addition function + to two arguments, namely
the constant atoms 1 and 2. Since both arguments are atoms, they can be
immediately evaluated. The value returned as a result of the evaluation
of the entire S-expression ( + 1- 2 ) is 3. Because the function + appears
to the left of the arguments, LISP S-expressions are examples of prefix
notation.
If *y of the argurnents in an S-expression are themselves lists (rather than
constant or variable atoms that canbe immediately evaluated), LISP first evaluates these arguments. In Common LISP (Steele 1990),this evaluation is done
in a recursive, depth-first way, starting from the left. We use the conventions
of Common LISP throughout this book. The $expression
(+ (* 23) 4 )
illustrates the way that computer programs in LISP can be viewed as a
sequence of applications of functions to arguments. This S-expression calls
for the application of the addition function + to two arguments, namely the
sub-$expression ( * 2 3 ) and the constant atom 4. hr order to evaluate the
entire S-expression, LISP must first evaluate the sub-S-expression (* 2 3 ) .
This argument ( * 2 3 ) calls for the application of the multiplication function * to the two constant atoms 2 and 3, so it evaluates to 6 and the entire
$expression evaluates to 10.
Other programming languages apply functions to arguments somewhat
differently. For example, the FORTH programming language uses postfix
notatian. For example, the above LISP S-expression would be written in
FORTH as
23*4+
FORIH first evaluates the subexpression
23*
by applyrng the multiplication function * to the 2 and the 3 to get 6. The
function * appears to the right of the two arguments, 2 and 3, in FORTH. It
then applies the addition function + to the intermediate result, 6, artd the 4 to
get 10.
32 Chapter 2
FORTRAN, Pascal,
argument functions/ so
written as
2*3+4
and C use ordinary infix notation for twothe above LISP and FORTH programs would be
in those languages. Here the multiplication function * appears between the
arguments i and 3 to indicate that the * is applied to the arguments 2 and 3'
Sii-ritarty, the addition ftrnction + is applied to the intermediate resull, 6, and
the 4 to get 10.
The term "computer program i' of course, carries the connotation of the
ability to do -o." than merely perform compositions of simple arithmetic
op*rutio*. Among the connotations of the term "computer program" is the
ufnty to perform altemative computations conditioned on the outcome of
intermediate calculations, to perform operations in a hierarchical way, and to
perform computations on variables of many different types. unlike most other
ptogtu*i"g languages, LISP goes about all these seemingly different things
in the salne *uy' usr ffeats the item just inside the outermost left Parenthesis as a function and then applies that function to the remaining items of
the list.
For example, the LISP }exPression
(+ L 2 (IF (> TIME 10) 3 4))
illustrates how LISP views conditional and relational elements of computer
programs as applications of functions to arguments. The three-argument addition function + at the top level calls for the application of the addition function to its three arguments: the constant atom 1, the constant atom 2' and the
sub-S-expression (IF (> TIME 1-O ) 3 4). In the sub-sub-$expression
( > TIME 10 ), therelation > is viewed as a function. The > is applied to the
variable atom TIME and the constant atom 10' The sub-subexpression
(> TIME 10 ) then evaluates to either t (true) or NIL (false), depending on
the current value of the variable atom TIME. The conditional operator TF is
viewed as a functionandis thenapplied to three arguments: thelogicalvalue'
T or NIL,retumedby the subexpression ( > TIME 10 ), the constant atom 3'
and the constant atom 4. If the first argument of an IF evaluates to T (more
precisely, anything other than Url), the function IF retums the result of evaluu*g its second argument (i.e., the constant atom 3), but if the first argument
evaluates to NIL, th" futt tion IF retums the result of evaluating its third
argument (i.e., the constant atom a). The s-expression as a whole evaluates to
either 6 or 7 ,depending on whether the current value of the variable atom
TIME is or is not greater than 10.
Most other pro[ramming languages use different slmtactic forms and statement types for opututions such as *, >, and IF. Operator precedence rules
and parenther", ir" used in such languages to ensure the correct association
of arguments to operators. LISP performs all of these operations with a common syntax.
Background on GeneticAlgorithms, LISR and Genetic Programming
Figure 2.4 LISP S-expression depicted as a rooted, point-labeled tree with ordered brandnes.
One of the advantages of prefix or postfix notation is that a ft-argument
function (such as the three-argument addition function above) is handled
in a more consistent and convenient fashion than is the case with ordinary
infix notation.
A.y LISP S-expression can be graphically depicted as a rooted pointlabeled tree with ordered branches. Figure 2.4 shows the tree corresponding to the S-expression (+ 12 (IF (> TTME 10) 3 4) ). This tree has nine points
(i.e., functions and terminals).
In this graphical depiction, the three internal points of the tree are
labeled with functions +, rF, and >. The root of the tree is labeled with the
function appearing just inside the leftmost opening parenthesis of the
$expression (i.e., the +). The six extemal points (leaves) of the tree are labeled
with terminals (the variable atom TrME and the constant atoms L,2,3 , 4, artd
10). The branches are ordered because the order of the arguments matters for
many functions (e.g., r r and >). Of course, the order does not matter for commutative functions such as +.
This tree form of a LISP S-expression is equivalent to the parse tree that the
compilers of most high level programming languages construct intemally,
unseen by the progranunel, to represent the program being compiled.
An important feature of LISP is that all LISP computer programs have just
one syntactic form (the $expression). The progr€uns of the LISP programming language are $expressions, and an gexpression is, in effect, the parse
tree of the program. Moreovet data is also represented in LISPby S-expressions. For these reasons, we use LISP throughout this book for presenting
computer programs and for explaining the genetic operations. However, it is
important to note that virtuully *y programming language is capable of
representing and implementing these programs and genetic operations. It is
not necessary to implement genetic programming in LISP. hadeed, since the
publication of Genetic Programming, versions of genetic programming have
been implemented in C, C++, Pascal, FORIRAN, Mathemattca, Smalltalk,
and other programming languages.
34 Chapter 2
2.3 BACKGROUND ON GENETIC PROGRAMMING
Genetic programming is an extension of the conventional genetic algorithm
described in section 2.I nwhich the structures undergoing adaptation are
hierarchical computer programs of d;mamically varying size and shape.
Genetic programming is an attempt to deal with one of the central questions in computer science: How can computers leam to solve problems without being explicitly programmed? hr other words, how can comPuters be
made to do what needs to be done, without being told exactly how to do it?
The search space in genetic programming is the space of all possible computer programs composed of functions and terminals appropriate to the problem domain.
Irr applying genetic Programming to aproblem, there arefiaemalor prepnratory steps.These five steps involve determining
(1) the set of terminals,
Q) the set of primitive functions,
(3) the fitness measure/
(4) the parameters for controlli.g the run, and
(5) the method for designating a result and the criterion for terminating a
run.
The first major step in preparing to use genetic Programming is to identify
lhe terminal setfor the ptoUt"-. The terminals correspond to the inputs of the
as-yet-undiscovered computer program'
The second major step in preparing to use genetic programming is to identfy th" function iet. Tkr,e functions may be standard arithmetic operations,
stand.ard programming operations, standard mathematical functions, logical
functions, or domain-tp".in functions. The functions may perform their work
by retuming one or more values or by performing side effects (e'g', on the
state of a system).
Each computer Program (i.e., mathematical expression, LISP S-expressiory
parse tree) is u.o-poiition of functions from the function set, f, ffid terminals from the terminal set, 9: The set of terminals (along with the set of functions) are the ingredients from which genetic Programming attempts to
constmct a computer Proglam to solve, or apProximately solve, the problem'
Aprecondition for solving a problem with genetic programming is that the
set o? terminals and the set of functions satisfy the sufficiency rcquirement in
the sense that they are together capable of expressing a solution to the
problem.
Each of the functions in the function set should be able to accept, as its
arguments, any value that may possibly be retumed by any function in the
function set and any value that may possibly be assumed by any terminal in
the terminal set. A function set and terminal set that together satisfy this
requirement are said to satisfy the closure requirement.
Background on Genetic Algorithms, LISR and Genetic Programming
These first two major steps correspond to the step of specifying the representation scheme for the conventional genetic algorithm. The remaining three
major steps for genetic programming correspond exactly to the last three major
preparatory steps for the conventional genetic algorithm.
The evolutionary process is driven by afitness measure that evaluates how
well each individual computer program in the population performs in its
problem environment. The fibress measure should satisfy the requirement of
bengfuVy defined in the sense that it is capable of evaluatitg a.y computer
program that it encounters in any generation of the population.
The primary parameters for controlling a nrn of genetic pro#amming are
the population size, M, andthe maximum number of generations to be run,
G. In additioru there are a number of secondary parameters (quanlitative and
qualitative control variables) that must be specified in order to conffol a run
of genetic programming (as identified in appendix D).
Each run of genetic programming requires the specification of atumination
criterionfor deciding when to terminate a run and a method of result destgnation.We usually designate the best-so-far individual as the result of a run.
Once the five major steps for preparing to run genetic programming have
been completed, a run can be made.
hr genetic prograrnming, populations of thousands of computer progams
are bred genetigally. This breedir,g is done using the Darwinian principle of
survival and reproduction of the fittest along with a genetic crossover operation appropriate for mating computer programs. As will be seen, a computer
program that solves (or approximately solves) a glvenproblem may emerge
from this combinationof Darwiniannatural selection and genetic operations.
Genetic programming starts with an initial population of randomly generated computer programs composed of functions and terminals appropriate
to the problem domain. The creation of this initial random population is, in
effect, a blind random search of the search space of the problem as represented by the computer programs. Because a population is involved, genetic
programming may be viewed as a parallel search algorithm.
The nature of the fitress measure varies with the problem.
For some problems, the fibress of a computer program can be measured by
the error between the result produced by the computer program and the correct result. The closer this error is to zero, the better the computer program.
Typically, the error is not measured over just one combination of possible
inputs to the computer program. kstead, error is usually measured as a sum
(or average) over a nurnber of representative combinations of the inputs to
the program (i.e., values of an independent variable). That is, the fihress of a
computer program in the population is measured over a number of different
fitness cases.The fihress cases maybe chosen at random over a range of values
of the independent variables or in some structured way (e.9., at regular intervals over a range of values of each independent variable). For example, the
fitness of an individual computer program in the population may be measured in terms of the sum, over the fitness cases, of the absolute value of the
differences between the output produced by the program and the correct
36 Chapter 2
answer to the problem (i.e., the Minkowski distance) or in terms of the square
root of the sum of the squares (t.e.,Euclidean distance).
For many problems, fitness is not computed directly from the value
returnedby the computer programbut instead is determined from the consequences of the execution of the program. For example, in a problem of optimalcontrol, thevalue retumedbythe controller affects the state of the system.
The fitress of a progmm is based on the amount of time (fuel, distance, or
money, etc.) it takes to bring the system to a desired target state. The smaller
the amount of time (fueI, distance, or money, etc.), the better. The fibress cases
in problems of control often consist of a sampling of different initinl conditions
of the system.
For problems involving a task, fitness may be measured in terms of
the amount of points scored (food eaten, work completed, cases correctly
handled, etc.).
If one is trying to recognize pattems or classify examples, the fibress of a
particula, p.ogru- *uy b" measured by some combination of the number of
instances handled correctly (i.e., true positives and true negatives) and the
number of instances handled incorrectly (i.e., false positives and false negatives). For example, correlation may be used a1 the fitress measure in pattem
recognition andclassification problems. The fihress cases consist of a representative sampling of pattems or items to be classified.
If the probllm itt roirr"t finding a good randomize[, the fitress of a given
program might be measured by entropy'
For some problems, it may be appropriate to use a multi-obiectiae fltness
rneasureincorporati^g u combination of factors such as correctress, parsimony,
or efficiency.
In each of the foregoing examples, fihress was comPuted explicrtly' However, fitness may be computed implicitly by permitting Programs to interact
(usually in a simulationf with their environment or among themselves in a
situation where certain behavior leads to survival (and, consequently, the
opportunity to reproduce and recombine) where certain other behavior does
not.
The computer prograrns in the initial generation (i.e., generation 0) of the
process *itt g"trurulty have exceed,ingly poor fitness. Nonetheless, some
individuals in the population will turn out to be somewhat more fit than
others. These differences in perforrnelnce are then exploited'
Both the Darwinian principle of reproduction and suwival of the fittest
and the genetic operation of crossover al€ used to create a new offspring PoPulation of individual computer programs from the current population of programs.
The reproduction operation involves selecting a computer Program from
the current population of proglams on the basis of its fihress (i.e., the better
the fitness, the more likely the individual is to be selected) and allowing it to
survive by copyrng it into the new population'
A crossover operation capable of operating on computer Programs
(described belowj is used to create new offspring computer programs from
Background on Genetic Algorithms, LISR and Genetic Programming
two parental programs selected on the basis of their fibless. The parental
programs typically differ from one another in size and shape. The offspring
programs are composed of subexpressions (subtrees, subprograms, subroutines, building blocks) from their parents. These offspring Programs are typically of different sizes and shapes than their parents. If traro computer Programs
are somewhat effective in solving a given problem, then some of their parts
probably have some merit. Recombining randomly chosen parts of somewhat effective programs may yt"ld a new computer Program that is even
more fit at solving the problem.
The mutation operation may also be used in genetic programming.
After the genetic operations are performed on the current population, the
population of offspring (i.e., the new generation) replaces the old population
(i.e., the old generation).
Each individual in the new population of computer programs is then
measured for fibress, and the process is repeated over many generations.
At each stage of this highly parallel process,, the state of the process will
consist only of the current population of individuals.
The force driving this process consists only of the observed fitness of the
individuals in the current population in grappling with the problem
environment.
As will be seer; this algorithm produces populations of computer programs
which, over many generations, tend to become increasingly fit at grappling
with their environment.
The hierarchical character of the computer programs that are produced is
an important feature of genetic prograrnming. The results of genetic programming are inherentlyhierardrical. In many cases the results producedby genetic
programming are default hierarchies, prioritized hierarchies of tasks, or hierarchies inwhich onebehavior subsumes or suppresses another. The dynamic
variability of the population of computer programs that are developed along
the way to a solution is also an important feature of genetic programming.
Another important feature of genetic programming is the absence or relatively minor role of preprocessing of inputs and postprocessing of outputs.
The inputs, intermediate results, and outputs are typically expressed directly
in terms of the natural terminolo W of the problem domain. The computer
programs produced by genetic programming consist of functions that are
natural for the problem domain. The postprocessing of the output of a program,rt any, is done by awrapper (output interface).
Finally, the structures undergoing adaptation in genetic programming are
active. Th"y are not passive encodings of the solution to the problem. Given a
computer on which to run, the strucfures in genetic programming are active
structures which usually can be directly executed in their current form.
In summary genetic programming breeds computer programs to solve
problems by executing the following three steps:
(1) Generate an initial population of random compositions of the functions
and terminals of the problem (computer programs).
38 Chapter 2
(Z) Iterativelyperform the following substeps until the terminationcriterion
has been satisfied:
(a) Execute each program in the population and assign it a fitness value
using the fitness measure.
(b) Create a new population of computer programs by applying the
following two primary operations. The operations are applied to
computei program(s) in the population selected with a probability
based on fibress (with reselection allowed)'
(i) Reproduce an existing program by copying it into the new
PoPulation.
(it) Create two new computer Programs from two existing Programs
by genetically recombining randomly chosen parts of two
existing programs using the crossover operation applied at a
randomly chosen crossover point within each PIogIam.
(3) Designate the program that is idenffied by the method of result designation fe.g., th" tesi-ro-far ind.ivid.ual) as the result of the run of genetic
progru*ing. This result may represent a solution (or an approxirn'ate
solution) to the Problem.
Figure 2.5 is a flowchart that implements the above three steps of the
genetic programming paradigm. Run is the current run number' N is the maxir,'rr- number of runs to be made. The variable Gen refers to the current generation number. M is the population size. The index i refers to the current
individual in the population. The sum of the probability of reproducfrott, pr,
and the probabilif of crossovel,pc, is one'
Mutation is not used for any of the runs reported in this book for reasons
discussed tnGeneticProgramming (subsection 6.5.1). Howevet if mutationwere
used, there would be a third branch flowing out of the sausage labeled "select
Genetic Operation" (as in figure 2'1)'
Crossover operates on two parental computer Programs selected with a
probability based. on fitness and produces two new offspring programs consisting of parts of each Parent.
foi"*u*ple, consider the following computer Program (shown here as a
LISP S-expression):
(+ (* 0.234 Z\ (- X 0.789) ) ,
which we would ordinarilY write as
0.2342+x-0.789.
This program takes two inputs (x md z) and produces a floating point
output.
Also, consider a second Program:
(* (* z Y) (+ Y (* 0.314 Z))),
which is equivalent to
zy(y + 0.3I42).
gg Background, on Genetic Algorithms, LISP, and Genetic Programming
lation for Run Run := Run + 1
Satisfied for Run?
Designate
Result for Run
Evaluafe fitness of each
individual in oopulati
Gen := Gen + 1
Select Genetic Operation
Select One
Individual
Based on Fitness
Figure 2.5 Flowchart for genetic programming.
hr figure 2.6, these two parenfs are depicted as rooted, point-labeled trees
with ordered branches. lrtemal points (i.e., nodes) of the tree correspond to
functions (i.e., operations) and extemal points (i.e.,leaves, endpoints) correspond to terminals (i.e., input data). The numbers beside the function and
terminal points of the trees appear for reference only.
Tlf,e crossoaer oPeration creates new offspring by exchanging subtrees (i.e.,
subroutines, sublists, subprocedures, subfunctior,r; b"ttr*enthetwoparents.
The subtrees to be exchanged are chosen at random. The two parents are
typically of different sizes and shapes. Suppose that the point, oi both trees
are numbered in a depth-first, left-to-right way starting at the top. Further
suPPose that the point 2 (out of seven points of the first parent) isiandomly
40 Chapter 2
ZY(Y +O.3r4z)
programs.
0.2342+X-0.789
Figure 2.6 TWo Parental comPuter
G) n,,Jb
2.8 Two remainders.
Figure2.7 TWocrossoverfragments.
Y +0.3142+X-0.789 o.B4z)
FigarcZ.9 TWo offsPring Programs.
Background on Genetic Algorithms, LISP, and Genetic Programming 4L
chosen as the crossxoer point for the first parent and that the point 5 (out of
nine points of the second parent) is randomly chosen as the crossover point of
the second parent. The crossover points in the trees above are therefore the
multiplication (.) in the first parent and the addition (") in the second parent.
The two uossoaer fragments are the two subtrees rooted at the chosen crossover points as shown nfigareZ.TThese two crossover fragments correspond to the underlined subprograms
(sublists) in the two parental computer programs above.
The runainder is the portion of a parent remaining after the deletion of its
crossover fragment.
Figure 2.8 shows the two remainders after removal of the crossover ftugments from the parents.
The first offspring is created by inserting the second parent's crossover frugment into the first parent's remainder at the first parent's crossover point. The
second offspring is created by inserting the first parent's crossover fragment
into the second parent's remainder at the second parent's crossover point.
The two offspring resulting from crossover are
( + (+ Y (* 0.314 Z)) (- x 0.789) )
and
(* (* z Y) (* 0.234 z)) .
The two ffipring are shown in figure 2.9.
The crossover operation creates two new computer programs using parts
of existing parental programs. Because entire subtrees are swapped and
because of the closure requirement on the function set and terminal set, this
crossover operation always produces syntactically valid programs as offspring
regardless of the choice of the two crossover points.
Because programs are selected to participate inthe crossover operationwith
a probability based on their fibress, crossover allocates future trials of the
search for a solution to the problem to regions of the search space whose
programs contain parts from promising programs.
The crossover operation described above is the basic version of crossover
for mating computer programs in genetic programming. Implementation of
automatically defined functions requires structure-preserving crossover as
described in section 4.8.
2.4 SOURCES OF ADDITIONAL INFORMATION
The field of evolutionary computation includes genetic algorithms,
eaolutionsstrategie,evolution ary programming, classifier systems, and genetic
programming.
Additional information on genetic algorithms can be found in Goldberg
1989; Davis t987,t991; MichalewiczI992; and Buckles and P"tty 1992. Conference proceedings in the field of genetic algorithms include Grefenstette
1985, 1987 ; Schaffer t989 ; Belew and Booke r L991 ; Forrest 1993; Rawlins 1991;
42 Chapter 2
and \A/hifley 1992.Stender 1993 describes parallelization of genetic algorithms.
Davidor L992 describes application of genetic algorithms to robotics. Schaffer
and Whitl ey L992 and Albrecht, Reeves, and Steele 1993 describe work on
combinations of genetic algorithms and neural networks. Bauer L994 describes
applications of genetic algorithms to investment strategies.
Much of the ongoing work of the Santa Fe Institute in New Mexico, as
reported in technical reports and other publications, is related to genetic
algorithms.
Recent work on eaolutionsstrategie is emphasized in Schwefel and
Maenner 199"1. andMaenner and Manderick 1992.
Conference proceedings in the field of evolutionary Programming
include Fogel and Atmar L992,1993. Fogel lggL describes the application
of evolutionary programming to system identification.
Genetic classifier systems (Holland 1986; Holland et al. 1986) employ creditallocation algorithms along with the genetic algorithm to create a set of ifthen rules to solve problems. Forrest LggL describes the application of genetic
classifier systems to semantic nets.
There are many papers on evolutionary computation in conference Proceedings from the fields of artificial life (Langton et al. 1989; Langton et al'
t99t;Langton llgg4I),emergent computation (Forrest L990), and the simulation of adaptive behavior (Meyer and Wils on L99t; Meyer, Roitblat' and
Wilson 1993).
The three joumals Adaptiae Behnaior, Artificiat Life, andEaolutionary Computation,published by The MIT Press, contain articles on various aspects of evolutionary comPutation.
Kinnear I994ais an edited collection of papers reporting on recent advances
in genetic Programming.
fn" pro.""ding, of th" IEEE World Conference on Computational hrtelligence in Ftorida-or, j.rne 26 to Jtly 2,1994, contain another large group of
papers on genetic Programming'
An annotated bitliography of genetic programming appears in
appendix F. 'ipp"rdix
G contains information on an electronic mailing list, public
respository, and FTP site for genetic programming'
Background on Genetic Algorithms, LISP, and Genetic Programming
Hierarchical Problem-S olving
The goal of automatically solving problems has been a continuing theme since
thebeginning of the fields of automatic programming, machine leaming, and
artificial intelligence (Nilsson 1980; Winston L98l;Shirai and Tsujiil9S2;Rich
1983;Chamiak and McDermott t985;Land,Rosenbloom, and Newell, 1986a,
1986b; Tanimoto L987 ;Barc,Cohery and Feigenbalrn 1989 ;Rosenbloom, Laird,
and Newell 1993).
In the top-down formulation of the three-step hierarchical problem-solving proceri Utu first step is the identification of the way of decomposing the
*tuU problem into one or more subproblems. The second step is the solving of the subproblem(s). The third step is the solving of the overall problem
,rsirg the now-available solutions to the subproblems.
We can illustrate additional aspects of the three-step hierarchical problemsolving process in its top-downformulationwith four related examples from
the field of elementary calculus.
Introductory textbooks on differential calculus usually show how to directly
differentiate elementary functions such as *2 or sin r by calling on first Principles and the definition of the derivative as the limit, as Ax approadres zero'
of a ratio of the changes, Ay and Ax. However, as soon as the function y(x) to be
differentiated beconies slightty more complicated (e'g', when y(x) is the product of two functions), it requires considerable effort to manipulate the curnbersome algebraic expressions required to find the limiting value of Ly/Lx'
3.1" HIERARCHICALDECOMPOSITION
Suppose that problem 1 is to differentiate the function y(;r), where y(x) is the
product
y(x) - x2 sin x.
Although it is possible to differentiate a product of two elementary functions
by caltiig on 6asic definitions and first principles, it is easier to employ the
three-step hierardrical problem-solving process'
First, one decomposes the problem of differentiating the produ ct, x2 sin x,
into two subproblems, namely the subproblem of differentiating the first factor, xz ,and the subproblem of differentiating the second factor, sin x '
Solution to
subproblem 1.1
dx' -=lX
fuc Solution to
orieinal
orob"lem I
"'c'os * + 2x sin x
d x2sinx Solution to
subproblem 1.2
dsinx
:COS.X
dx
Decompose
Figure 3.1-
.
Three-step top-down hierarchical approach applied to problem 1of differentiating
Y(x) = -r- sln Jr.
Second, one separately solves the two subproblems. As already mentioned,
it is relatively easy to differentiate elementary functions such as x" or sin.r
separately using first principles; the derivatives arcZ;- arrd cos x, resPectively.
Then, in the third step of the hierarchical problem-solving Process/ one
assembles the solutions to the two subproblems into a solution to the original
problem. \A/hen differentiating a product, the assembly involves one addition
and two multiplications. Specifically, the derivative of the product is found
by multiply-g the first factor, x2 ,by the derivative of the second factor (i.e.,
the solutiory cos x, to the second subproblem) and then adding the result of
this first multiplication to the result of multiply*g the second factoq, sin .r, by
the derivative of the first factor (i.e., the solutiort,7r, to the first subproblem).
Thus, one obtains
dv?) -'r \" ' ) = J' cos x +2xsin x
dx,
as the solution to problem L.
Figure 3.L shows the application of the three-step top-down hierarchical
approach applied to problem 1. The first step is labeled "decompose" and
produces the boxes containing the two subproblems 1.L and 1.2. The second
step is labeled "solve subproblems" and leads to the boxes containing the
solutions to subproblems 1.1 and 1.2. The third step is labeled "solve original
problem." Solving the original problems requires that one "assernble" the
solutions to the two subproblems into the solution to the overall problem.
The three steps of this problem-solving process are not necessarily obvious
or easy to perform. Lr particulaq, the step labeled "decompose" requires the
insight that factoring the given expression in a particular way is productive.
M*y decompositions yield subproblems that are much harder to solve than
the original problem. The step labeled "solve subproblems" requires actual
differentiationby computing the limiting values of Ly / Lxfor two expressions.
This step requires some effective mechanism for actually solving problems.
The step labeled "solve originalproblem" requires finding away to assemble
the now-available solutions to the subproblems using the available primitive
operations, such as multiplication and addition. Like the second step, this
step requires an effective mechanism for actually solving problems.
Solve
subproblems
Solve original
problem
Chapter 3
Decompose problem 2 Solve problem 2's subproblems Solve original problem 2
Figure 3.2
,,
Three-step"top-down hierarchical approach applied to problem 2of differentiating
f(x)-x"sin x+x'.
Reduction in the overall effort required to solve a Problem is a motivating
reason for using the three-steP hierarchical problem-solving Process' If the
decomposition is done astutely, less overall effort is required to do the
decompositiorU solvethesubproblems, and assemblethe solutions to the subproblems into an overall solution than is required to solve the original probiem directly. The net savings accrues even though the process requires three
separate steps and requir", th" solution of more separate problems' The problem of differentiating the product x2 sin x entails solving four different problems using the hierarchicJ pto."tt. One must do the deco^mposition; one must
separatef differentiate the turo elementary functions, *2,and sin'x; and one
still must solve the overall problem (by assembling the overall solution by
applylr:rg one ad.dition and two multiplications to the now-available derivaiirlJ"r ,2 and sinx). Nevertheless,Lss total effort is required to grapple
with all four of these separateproblems thanwouldbe required to apply first
principles and the definition of the derivative to solve the overall problem'
Because of this, hierarchical decomposition can be a way of reducing the total
effort needed to solve an overall problem'
3.2 RECURSIVE APPLICATION AND IDENTICAL REUSE
Now let us consider problem 2 requiring the differentiation of the following
two-term stun:
f(x)=x2sin x+x2'
In applying the three-step hierarchical problem-solving process to problem:.,we first decompose the problem of differentiating the sum into subproblem 2.1 ofdifferentiating the first addend, x2 sin x, and subproblemZ'Z
of differentiating the seconcl addend, *2, asshown in figure 3.2.
Hierarchical Problem-Solvin g
SubProblem 2.1 d f fimx
dx
Decompose Solvesubproblem2.l's Solve
subproblem2.l sub-subproblems subproblem2.l
Solution to
subproblem 2.1
r2cosx +2xsin x
Sub-subproblem2.l.2
d sin .r
- = COst
bt
Subproblem 2.2 d i
.lx
Reuse solution to
sub-subproblem 2.1.1
d'* -r'
dx
Second, we solve these two component subproblems. Suppose we were
seeing subproblem 2.L requiring the differentiation of the product x' sin x for
the first time (i.e., we had not just encountered it as problem L above). Subproblem 2.T ts sufficiently dfficult that it should be solved by invoking the
entire three-step hierarchical problem-solving process as if it were itself an
original problem. Recursive invocation of the entire three-step hierarchical
problem-solvingprocess is anotherway of reducing the total effortneeded to
solve an overall problem.
IA/hen we recursively invoke the entire three-step process on subproblem
Z.1.,wefind that subproblem 2.1 decomposes into sub-subproblem 2.1.1 (differentiating *21 *td sub-subproblem 2.1.2 (differentiating sinx). We solve
these two sub-subproblems and assemble their solutions into a solution
of subproblem2.!, "2 "o,
x +2xsinx.
If we are alert as we start to solve subproblern2.2(differentiating *21,we
will notice that we already differenti ated xz as part of the process of solving
subproblem 2.1 (i.e., as sub-subproblem 2.1.D.It would be much more efficient to reuse the already-obtained solution to this sub-subproblem than to
solve it again. Tlis identical reuse is another way to reduce the total effort
needed to solve an overall problem.
The third step in solving problem 2 is to solve the overall problem by
assembling the solutions to subproblems 2.1and2.2into a solution to the
overall problem. When differentiating a sum, the assembly consists of adding the derivative of the first addend to the derivative of the second addend.
Thus,
df(x)
=
"2
.o, x * 2xsin x + 2x.
dy
is the solution to problem 2 of differentiating the sum f (x) - x2 sin x + x2.
Figure 3.2 shows the application of the three-step top-dor,rrn hierarchical
approach applied to problem 2 of differentiating the sum f (x) - xz sin x + x2 .
The decomposition creates subproblems 2.1 (differentiating the first addend
x2 sinx) and 2.2 (dtfrerentiating the second addend
"2;.
This first step is
labeled "decompose problem2" near the top left of the figure and gives rise
to the two large boxes that dominate the middle of the figure. Lr the second
step of solving problem 2, subproblems 2.1. and2.2 are solved. This step is
labeled "solve problem 2's subproblems" near the top middle of the figure.
The third step of solving problem 2 involves assembling the solutions to subproblems 2.I andZ.2tnto an overall solution. This step is labeled "solve original problem2" near the top right.
Subproblem 2.1 (the largest box of figure 3.2) can be most efficiently solved
by recursively invoking the entire three-step problem solving process on it.
Thus, we insert all three steps shown in figure 3.L inside the large box labeled
"solve subproblem 2.1-." These steps are now relabeled "decompose subproblem 2.1:' "solve subproblem 2.L's sub-subproblems," and "solve subproblem 2.I." The decomposition of subproblem 2.1 gives rise to
48 Chapter 3
sub-subproblem 2.1.1 (differentiating *21 at dsub-subproblem 2.1.2 (differentiating sinx).
The solving of subprobleml.Z(differentiating "')
.qlbe entirely avoided
by reusing, *itho.tt modification, the derivative of x2 already obtained in
the process of solving sub-subproblem 2.1,.1,. This reuse of an already-solved
sub-subproblem is indicated by the gray arrow between "sub-subproblem
z.L.L" and "sub problem 2.2."
The solution to problem 2 is produced by assembling the solutions to subproblems 2.1md.2.2.Thrsstep is labeled "solve original problem 2" neat the
lop right of figure 3.2. This step involves solving the original problem by
assembling the now-available solutions to the subproblems.
3.3 PARAMETERIZED REUSE AND GENERALIZATION
Now consider problem 3 of differentiating the sum
g(r)-*3+*4.
If we were to proceed unthinkingly it applying the three-step hierarchical
problem-solving process to problem 3, we would first decomPose the problem
illto the two subproblems of differentiating the two addends. Subproblem 3'L
would require tie differentiation of x3; and subproblem 3'2 would require
the differentiation of xa. hr this treatment, subproblems 3.1 and 3'2 are two
entirely unrelated subProblems.
Figure 3.3 shows the application of the three-step top-down hierarchical
approachappliedtoproblem3 of differentiatingy(x) - s@) =.'3 +to it'twhich
there are separate subproblems for differentiating x3 and xo' The first step is
labeled "decomposu/and produces the boxes containing the two subproblems 3.1 and g.Z. The ru.orrd step is labeled "solve subproblems" and leads to
the boxes containing the solutionsto subproblem 3.1 (differentiating x3; and
subproblem 3.2 (differentiating xo;. The third step is labeled "solve original
Decompose Solve
subproblems
Solve original
problem
Figure 3.3 Three-step top-down hierarchical approach applied to problem 3' differentiating
y(x) = g(x) - *3 + ro, in which there are separate subproblems for differentiating
x3 u.,,d. x4.
Hierarchical Problem-Solving
Solution to
subproblem 3.1
dx' -=5X ^ z
dx
Subproblem 3.1:
Differentirate .r'
d.
Solution to
orieinal
problem 3
3x' +4x'
Original
problem 3
d x'+x*
T
Solution to
subproblem 3.2
dx- . 3
dx
Subproblem 3.2:
Differentiate x'
dxa
problem." This step involves assembling the solutions to subproblems 3.1
and 3.2 to obtain 3xz + 4x3 asthe solution to the overall problem.
Howeveq, if we are alert, we will notice that the subproblems 3.L and3.2
are similar; they differ only in that the power of x to be differentiated is 3,
rather than 4. It would be preferable to have a general problem-solving mechanism for differentiating x* xtdthen invoke this one general mechanism on
two occasions to differentiate x3 and xa. On each of the two invocations, the
general differentiator for r' would take into account the particular power of
x involved (i.e., 3 or 4). That is, the first invocation of the general problemsolving mechanism for differentiating x- would be instantiated with the
argument 3, and the second invocation would be instantiated with the
argument 4.
If a general mechanism is to exploit similarities among subproblems, it is
first necessary to identify the differences between the similar subproblems to
be solved by the general mechanism. Second, it is necessary to communicate
the identified difference to the general mechanism. This is called instnntintion.
Third, the general mechanism must appropriately use the communicated
information to solve the particular instance of the class of similar problems.
In this example, the difference between the two subproblems consists of the
single numerical argument (3 versus 4). The value of this argument is the
information that must be communicated. Upon receipt of this informatiory
the general mechanism for differentiating x' will use the numerical argu_
ment (3 or 4) to produce the appropriate answeq, 3x2 or 4x3. This process of
parameterized reuse illustrates yet another way to reduce the overall effort
needed to solve a problem. Parmetrized reuse corresponds to a generalizntion
of the problem-solving mechanism.
Figure 3.4 shows the application of the three-step top-down hierarchical approach applied to problem 3 of differentiating
-Sei
= x3 + xo in which
there is a general mechanism for differentiating *d .Th"first step is labeled
"decompose" and produces one subproblem (labeled g.g) (differentiate
x*), rather than the two subproblems shown in figure 3.3. The second
step is labeled "solv€ subprobr em j.j" and yield, i g"rr"ral mechanism
for differentiating x* . The two subproblems 3.1 and g.z otfigure 3.3 are
solved in figure 3.4 by means of a.parameterized reuse of ir,e general
mechanism for differentiating x* . When this general mechanism is
instantiated with 3, it produces the derivative of *t;and when it is instantiated with 4, it produces the derivative of xa. The labeled arrows in figure 3.4 show these instantiations. The third step is labeled "solve original
problem" andassembles (by adding) the derivative of x3 and the derivative of xa to create the solulion to th" orr"rull problem.
In the terminolfSf of computer programming, the two subproblems of
differentiating xr-and x4 are parameterized,by *. The differentiating
mechanism is a subroutine. The calling program invokes the subroutine
with a particular value of the parameter, m. The particular value of the
parameter is communicated to the subroutine as a transmission of the
50 Chapter 3
Solve subproblem
dx3 )
-_ 1YOriginal dx
problem 3
rr+
a x + x
dSubproblem 3.3:
Differentiate x^
dx^
dx
Solution to
subproblem 3.3
d x^ m-l T='*
Solution to
orisinal
probTem 3
3*' +4*'
dxa ? --4xDecompose
Figure 3.4 Three-step, top-down hierarchical approach applied to problem 3, differentiating
A@) = 8(x) = x' + x* ,in which there is a general mechanism for differenttating xm .
Parameter. The subroutine is written in terms of a dummy variable (formal parameter) and uses the dummy variable in an appropriate way to
produce its result.
3.4 ABSTRACTION
Now consider problem 4 of differentiating, with respect to the independent
variable n the four-term sum
h(x) -x2 sin x + *2 + x3 + Cl(r),
where the independent variable / and the function O(f) do not depend on x,
and are not correlated withx in any way. hr applyng the three-step hierarchical problem-solving process to problem ,wefirst decompose the problem of
differentiating this four-term sum into the four subproblems of differentiating the four addends. A(f) makes no contribution to the overall mathematical
function that expresses the way h(x) changes in response to changes in the
independent variable x. Accordingly, when we solve the fourth subproblem,
we will find that d A(D / dx is zero. The independent variable f and the funcfion A(f) make no contribution to the derivative because they are completely
irrelevant to.r. l,Vhen certain variables can be identified as being irrelevant to
the solution to a subproblem, the subproblem can be solved without regard
to the values of these irrelevant variables. If we have a mechanism for differentiating x* thatapplies to allvalues of xand m,thalmechanism also applies
for all combinations of values of x, and m, and f (where f is an irrelevantvariable). Once a certain variable is identified as being irrelevant to the solution
to a subproblem, the mechanism for solving that subproblem becomes reusable on all the combinations of the three variables (*, *, and f). The process of
excluding irrelevant information (the abstrsction of a problem out of an environment containing irrelevant variables) makes a solution to a subproblem
applicable to more situations and thereby facilitates reuse of the solutions to
already-solved subproblems and may result in less total effortbeing required
to solve an overall problem.
The calculus examples above illustrate the five reasons why the hierarchical problem-solving approach is beneficial.
51 Hierarchical Problem-Solving
First, when a complex problem is decomposed astutely,Iess overall effort
is required to decompose a problem into subproblems, solve the subproblems, and finally assemble the solutions to the subproblems into a solution
to the original problem thanis required to solve the originalproblem directly.
This is the benefit associated with hierarchical decomposition.
Second, the ability to recursively invoke the hierarchical problem-solving
process within the second step of the process brings the benefits of the entire
process to bear within the second step so that a subproblem can be solved
with less effort than if it were solved directly. This is the benefit associated
with recursiae application of the hierarchical approach.
Third, if the problem environment contains regularities, and if the decomposition is done astutely so that a subproblem corresponds to such a regularity, the solution to the subproblem becomes potentially reusable. When a
particular subproblem repeatedly occurs in an identical way in a problem
environment, the subproblem need not be separately solved each time that it
occurs. Instead, the solution to the subproblem can be reused, without modification, on each identical recurrence of the subproblem. This is the benefit
associated with idmtical reuse.
Fourth, if the problem environment contains regularities, the solution to a
subproblem becomes potentially reusable if a solution to a subproblem can
be constructed that solves not just one particular subproblem, but instead
solves a class of similar subproblems. When the differences between multiple similar occurrences of a particular subproblem canbe identified so that
the solution to the subproblembecomes reusablemerelyby taking the identified differences into account, the solutionto the subproblembecomesagmeralization. This is the benefit associated with reuse with modification or
parameterized reuse. Generalization is a consequence of parameterized reuse.
The method of communicating the identified differences may be direct or
indirect. Lr the direct method of communication, the dffierences associated
with €u:r occurrence of a subproblem are explicit$ expressed as free parameters and the particular values of the parameters are explicitly communicated
to the mechanism for solving the subproblem. hr the indirect method of communicatiory the differences associated with an occlurence of a subproblem
are embodied in the current state of the system and the mechanism for solving the subproblem merely deals with the state of the system that it encounters. In the indirect method, communication to the mechanism for solving the
subproblem is implicit through the current state of the world.
Fifth, to the extent that certain variables of the system can be identified as
being irrelevant to the solution of a subproblem, then a solution of a subproblem can be reused on every combination of the irrelevant variables. Each
solution to a subproblem (whether applicable only to identical situations or a
broader setof similar situations) becomes reusable ona largenumber of combinations of variables of the system. This may result in less overall effort being required to solve theproblem. This is thebenefit associated wlthabstraction.
52 Chapter 3
hr summary, the five ways that the hierarchical problem-solving approach
reduces the overall effort required to solve a problern arise from the
. efficiency associated with the process of hierarchicsl decomposition,
. efficiency gained by tecursiae application of the Process of hierarchical
decomposition,
. identical reuseof solutions to already-solved subproblems,
. parameterized reuse (reuse with modification) ot generalization of solutions
to similat but different, subproblems, and
. abstraction of irrelevant variables broadens the applicability of the
solutions to subProblems.
The five benefi ts of the hierarchical problem-solving approach offer Promising ways to gain the leverage that is needed if methods of automatic programming are ever to be scaled up from small "ptoof of principle" problems
to large problems.
ffr" ulluting benefits of the hierarchical three-step problem-solving Process raise the practical question: How does one go about implementing this
process in an automated and domain-independent way?
From the toP-down Point of view:
. How does one go about decomposing a problem into subproblems?
. Once the subproblems have been identified, how does one solve the
subproblems?
. once the subproblems have been identified and solved, how does one
invoke and assernble the solutions of the subproblems into a solution to
the original overall Problem?
A similar set of practical questions arises in connection with implementing
the hierarchical three-step problem-solving Process from the bottom-up point
of view:
. How does one go about finding regularities at the lowest level of the
problem environment?
. Once the regularities have been identified, how does one recode the original
problem into a new problem in terms of these regularities (i.e., how does
one change the rePresentation)?
. Once the regularities have been identified and the recoding has been done,
how does one solve the original problem as now framed in terms of the
new representation?
3.5 SOARANDEXPLANATION.BASED GENERATIZATION
SOAR (an acronym for "State, Operator, And Result") is one approach to
applying the three-step hierarchical problem-solving process. SOAR was
a.""frp"d in the early 1980s at Camegie Mellon University by ]ohn Laird
53 Hierarchical Problem-Solving
(now at the University of Michigan), Paul Rosenbloom (now at the University of Southern California), utd the late Allen Newell (Laird, Rosenbloom,
and Newel1,l986a,I986b; Rosenbloom, Laird, and Newell 1993).
SOAR is an architecture for general problem solving. It is inspired by its
inventors'views on human cognition processes. The SOAR ardritecfure has
been used to control autonomous agents. Such agents use available knowledge, solve problems, increase their knowledge by remembering solutions
that they find, and interact with their environment. In addition, SOAR attempts to provide a unified theory of human cognition and a way to model
cognitive data.
The SOAR architecture formulates all goal-oriented behavior of autonomous agents as a search in a problem space. Aproblem space consists of a
set of states and a set of operators that cause ch-anges in state of the autonomous agent. A goal is formulated as the task of reaching a desired
state (or states). Satisfying a goal involves starting at the initial state and
applying a sequence of operators that results in reaching the desired
state(s). Interaction with the external environment may occur by means of
perceptual input (e.g., from a vision system) and motor commands (e.g.,
to control a robot arm).
Knowledge is represented as a set of if-then production rules. \tVhen the
condition part of an if-then rule matches the current state of the system, the
rule fires. \Alhen knowledge is incomplete, there may be no rule that applies.
hrthatevent the systemwillnotknowwhat operator to apply and the system
will not know how to proceed. IMhen such art impasse occurs, a subgoal is
generated to resolve the impasse. SOAR processes the subgoal as a new problem space. Further impasses may arise in the new problem space causing the
generation of still more subgoals and problem spaces. The result is a hierarchy of subgoals, each with an associated problem space. In the SOAR literafure, this process is knor,rm as unhtersal sub-goaling.
Subgoals become satisfied when some problem-solving technique solves
the problem. SOAR works in conjunction with various domain independent
methods for solving problems (so-called weak methods). Laird, Rosenbloom,
and Newell 1986a enumerate 77 different weak methods that can be used
with SOAR. These weak methods include guterate and test @littd random
search), simplehillclimbing, steepestascenthillclimbing, various searchtechniques (e.g., depth-first searcku alpha-beta search, iterative-deepening search),
various techniques of artificialintelligence(e.g.,means-end analysis, constraint
satisfactiory unification), and other techniques. Eventually, the available weak
method may solve the subproblem in the newly created problem space, thereby
satisfying the subgoal.
\A[hen a subgoal is satisfied, the solution produced by the weak method is
summarized and remembered in anadditionalnew set of if-then rules, called
chunks. That is, SOAR remembers (caches, leams) the way of satisfying the
subgoal (solving the subproblem). Note that in the SOAR community the
word "Iearn" has the everyday meaning of "remembering" the solution of a
subproblem, whereas in the machine leaming community the word "Ieafn"
54 Chapter 3
has the specialized meaning of "finding" or "discovering'the solution. The
chunks that SOAR has leamed (remembered) are then available for subsequent reuse. Both identical reuse and parameterized reuse are contemplated
by SOAR. That is, SOAR can be programmed to do generalization and abstraction. If the system ever again arrives in a state where the rules of a chunk
are applicable, no impasse is generated on this occasion. Consequently, no
subgoal and no new problem space is generated. hrstead, the applicable
if-then rules of the chunk fire and the previously discovered solution to the
subgoal is applied to the current situation.
SOAR is a variant of explanation-based generalization (DeJong 7981.;DeJong
L983;Winston et al. 1983; Mitdrell, Keller, and Kedar-Cabelli I986;Rosenbloom
and Laird L986; Minton 1990).
hr additiory the pioneering work on search and macros (Fikes, Hart, and
Nilsson 1972; Korf 1980, 1985a, 1985b) serves as ern underpinning for some of
the techniques of SOAR. Fikes, Hart, and Nilsson(1972) proposed a process
for saving a generalued version of a plan called "macrops." These plans,
constructed by the STRIPS planning system, were represented in a tabular
format that linked the preconditions of each operator in the plan with other
operators in the plan that established those preconditions. The format allowed
either all or just part of the saved plan to be easily accessed for future use.
Additionally, the plans were generalized by replacing constants that were
specific to the original use of the planbyvariables thatcouldbebound differently in subsequent uses. Their generalization process foreshadowed explanation-based generalization.
Consider the eight-puzzleinwhich there are eightnumbered tiles and one
hole within a3-by-3 grid. The system begins with the eight tiles and one hole
in initial locations within the grid (the initial state). The goal is to relocate the
eight numbered tiles and one hole to the desired locations (the desired state).
The four available operations for changing the current state of the system
involve moving a tile to the left, right, up, or down into the adjacent hole
(thereby causing the hole to end up to the right, left, down, or uP/ resPectively). Asolutionto theproblemconsists of a sequence of moving operations
that causes all eight tiles and the hole to end up in their desired locations.
The eight-puzzLecan now serve to show how SOAR direct$ and explicit$
implements the three-step hierarchical problem-solving process. First, the
problem is explicitly decomposed by the user into separate subproblems.
Solving the eight-puzzle in SOAR begins with a clever serial decomposition
(Korf L985a,1985b) in which the problem is explicitly decomposed into an
ordered sequence of six subproblems (subgoals). Subproblem k involves moving the tile numbered k to its final desired location with each lower-numbered tile remaining at (or being restored to) its respective desired location.
\Atrhen the solutions to these six subproblems are executed in consecutive ordet, the overall effect is that the first six tiles become properly located and the
remaining two tiles and hole are also necessarily in their proper locations.
Second, each subproblem is separately solved by a weak method, such as
iterative-deepening search (Korf 1985b). The solution to a subproblem of
55 Hierarchical Problem-Solving
Properly locating tile ft consists of a sequence of sliding operations discovered by the weak method.
Third, the overall problem is solved by assembling the solutions to the six
subproblems. The assernbly consists of executing the six subproblem solutions, once each, in the predetermined consecutive order.
As will be seen starting in the next chapter, the approach for hierarchical
problem-solving used in this book is very different from SOAR, explanationbased generalizatiory and other techniques of symbolic artificial intelligence.
Chapter 3
Introduction to Automatically Defined
Functions - The TWo-Boxes Problem
This chapter will use a simple illustrative problem, the two-boxes problem, to
lay the groundwork for the methods that will be used throughout this book.
The two-boxes problem will be stated in section 4'L'
The preparatory steps necessary to solve the two-boxes problem using
geneticprogramming without automatically defined functions will be presented in section .Z-andthe problem will be solved in section 4.3. (These
sections review the way of apptying genetic programming to a problem and
may be skipped by readers already familiar with genetic programming)'
Section 4.4 will describe the idea of a subroutine. The idea of automatically
defined functions will be introduced in section 4.5.
The additional preparation necessary to solve the two-boxes problem
using genetic progammin gwithautomatically defined functions willbe presented in section 4.5.
section4.T describes how the initial random population is generated with
automatical$ defined functions. Section 4.8 describes structure-preserving
crossover and the typing required with automatical$ defined functions'
The two-bo".r proUf"* ir then solved in section 4.9 using automatically
defined functions.
section 4.10 will present the methodology for computing the average structural complexi ty, S ,of the solutions produced by genetic Proglamming' Then'
the average structural complexity without automatically defined functions'
Swithout rr"Iitt U. compared *i9 the average structural complexity with automatically defined functions , Swith,for the two-boxes problem.
section4.Ll willpresent the methodology for calculating the computational
effort, E,formeazuring the number of fiffress evaluations required to yield a
solution to a problem with a satisfactofily high probability' Then, the computational effortwithout automatically defined functions, Ewithout,willbe compared with the computational effort with automatically defined functions,
Ewith,for the two-boxes Problem'
4.1 THE PROBLEM
The two-boxes problem has six independent variables, called Ls, Ws' Hs'
L1,W1,and- H1, and one dependent variable, called D'
Table 4.1. shows L0 fibness cases for the two boxes problem, each consisting
of a combination of the six independent variables and the associated value of
the dependent variable. The values of the six independent variables aPPear
in the first six columns of each row. The last column of each of row contains
the value of the dependent variable, D, that is produced when some as-yettrnknown mathematical expression is applied to the given values of the six
independent variables. For example, the first row of this fitness-case table
shows that when Lo=3, W0 = 4,Ho -7,Lt=2,Wt = 5, and Ht= 3, then the
value of the dependent variable,D,ts54.
The two-boxes problem involves finding a computer Program (i.e., mathematical expression, composition of primitive functions and terminals) that
produces the observed value of the single dependent variable as its ou@ut
whengiventhevalues of the sixindependentvariables as input. We callproblems of this type symbolic regressionbecause we are seeking a mathematical
expression, in symbolic form, that fits, or approximately fits, a given sample
of data. A symbolic regression problem may also be called a symbolic system
idcntificationproblem or ablack box problem.
Symbolic regression differs from conventional linear regressiory quadratic
regression, exponential regressiory and other conventional types of regression where the nature of the model is specified in advance by the user. In
conventional linear regression, for example, one is given a set of values of
various independent variable(s) and the corresponding values for the dependent variable(s). The goal is to discover a set of numerical coefficients for a
linear expression involving the independent variable(s) that minimizes some
measure of error (such as the square root of the sum of the squares of the
differences) between the values of the dependent variable(s) computed with
the linear expression and the given values for the dependent variable(s). SimilarIy, in quadratic regression the goal is to discover a set of numerical
Table 4.1 Fitness-case table for the two-boxes problem showing the value of the
dependent variable, D, associated with the values of the six independent variables, Lo, Wo, Ho, Lt, Wr, and Hr.
Fibress case Lo Wo Ho Lt Wl H1 D
1,
2
3
4
5
6
7
8
9
108
3
7
10
a
J
4
a
J
F
J
1
2
1
4
L0
9
9
3
J
9
2
6
10
2
10
8
1
v
9
L
J
2
5
3
1
6
4
1
4
6
2
10
45
7
9
4
5
2
1
9
9
8
7
5
4
J
1
6
6
J
7
9
6
1
54
600
312
111
-18
-L71,
363
-36
+4
58 Chapter 4
coefficients for a quadratic expression that minimizes the error. It is left to the
user to decide whether to do a linear regtessiory a quadratic regression, an
exponential regression, or whether to try to fit the data points to some other
type of function. But often, *re real problem is deciding what type of model
most appropriately fits the data, not merely computing the appropriate
numerical coefficients after the model has already been chosen. Symbolic
regression searches forboth the functional form and the appropriate numeric
coefficients that go with that functional form.
Amere glance at table 4.L willnot disclose the mathematical relationship
between the six independent variables and the one dependent variable. The
relationship is not at all obvious. In fact, the relationship is nonlinear and
cannot be discovered merely by applying conventional linear regression.
Genetic prograrnming provides a way to find a mathematical relationship
(i.e., a computer program) that fits, or aPProximately fits, this given sample
of data. Infact,the relationship is
D =WoHoh *WrHth.
Figure 4.L shows two boxes. The relationship among the variables in
table 4.f represents the differenc e , D,in volume between a first box whose
length, width, and. height are Ls, wo, and H6, respectively, and a second
box whose length, widih, and height are Lt,Wv and Hr' respectively'
Ahuman programmer writing a computer proglam to compute the difference in these two volumes in a programming language such as FORIRAN
might write a main program something like
D=W0*L0*H0-WL*LL*H1
PRINT D
If it were understood that the last value computed by a program is its output, then there would be no need for the explicit PRINT statement in the
above FORTRAN program. similarly, in the LISP programming language, it
is sufficierrt merely to write the $expression
(_ (* L0 (* w0 H0) ) (* L1 (* W1 H1)))
and evaluate the S-expression for its value'
Figure 4.L TWo boxes.
Introduction to Automatically Defined Functions - The TWo-Boxes Problem
The FORTRAN statement and the $expression are each a symbolic solution to this system identification problem.
The above computer programs are, of course, v€T simple in that they produce only a single value. br general, computer Proglams can retum a set of
values, side-effects on a system, or a combination thereof.
4.2 PREPARATORY STEPS WITHOUT ADFs
This section applies genetic programming without automatically defined functions to the two-boxes problem.
As already mentioned, the five major preparatory steps in applying
genetic programming to a problem involve determining
(1) the set of terminals,
(2) the set of primitive functions,
(3) the fihress measure/
(4) the parameters for controlling the run, and
(5) the method for designating a result and the criterion for terminating
a run.
The first major step in preparing to use genetic programming is to identify
the set of terminals. The terminals can be viewed as the inputs to the as-yetundiscovered computer program. The terminals from the terminal set, along
with functions from the function set, are the ingredients from which genetic
pro$amming attempts to construct a computer program to solve, or approximately solve, the problem. The terminals for this problem are the six independent variables and the terminal set,'T,is
t- {L0,W0, H0, L1, Wl_, H1}.
The second major step in preparing to use genetic programming is to
identify the set of functions that are to be used to generate the mathematical expression that attempts to fit the given finite sample of data. A
reasonable choice might be the function set consisting of the ordinary twoargument arithmetic operations of addition, subtraction, and multiplication along with a version of division that is protected against divisions by
zero.The protectediaision function % takes two arguments and returns the
number L when division by 0 is attempted (including 0 divided by 0), and,
otherwise, returns the normal quotient. Therefore, the function set, f, f.or
this problem is
f={*,-,*,%}.
An argument map is associated with each set of functions. The argument
map olaset of functions is the listcontaining thenurnber of argtrments required
by each function. Thus, the argument map for the function set, f,is
{2,2,2,21.
Chapter 4
61.
The protected division function ensures, as a practical matter, that the
function set, f, satisfies the closure requirement for this particular problem. However, the potential of an overflow or underflow always exists
whenever any arithmetic operation (including addition, subtraction/ or
multiplication) is performed on a computer (as discussed further in
section 1I.2).
Each computer program is a composition of functions from the function
sel, f, and terminals from the terminal set, T. hr this problem, the ouput of
any program composed of these functions and terminals is intended to correspond directly to the value of the dependent variable, D, of this problem.
Therefore, there is no need for a wrafper (output interface) to further modrfy
the output of the program for this problem.
The third major step in preparing to use genetic Programming is identi
fyi.g the fitness measure. Fitness is typically measured over a number of
different fitness cases. There are 10 fitness cases for this problem, each
consisting of a combination of the six independent variables, Ls, Wa, Ho,
Lr,Wt,and H1, and the associated value of the dependent variable, D'
In defining fitress for a problem, we start with a definition of rarn fitness
stated in terms natural to the problem domain. The raw fibress for this problem is the sum, taken over the L0 fitoress cases, of the absolute value of the
difference (error) between the value produced by a program for the six given
values of the independent variables and the correct value for the dependent
variable D. The closer this sum of errors is to 0, the better the program' Stan'
dnrdizedfitness(described in detailtnGenetic Programming,subsection 6.3.2) is
the zero-based fitness measure actually used by genetic Progranuning' Since
better proglams have a smaller value of raw fitr-ress and since a 1Oo%-correct
program riroutd have a raw fihess of 0 for this problem, standardized fihness
is the same as raw fibress for this problem'
since every computer program in the population retums a numerical value,
it is always possible to compute the fifiress of any Program. Therefore, this
fibless measure satisfies the requirement of being fully defined for any program that might arise in the population'
The hits measure for this ptoUt.tt counts the number of fihress cases for
which the numerical value retumed by the Proglam comes within a small
tolerance (called thehits crituion) of the correct value. The hits criterion for
this problem is 0.01.
The fourth major step in preparing to use genetic Programming involves
determining the values of certain parameters to control the runs.
The two maior parameters for controliing a run of genetic Programming
are the populatio,n sire, M, and.the maximum number of generations to be
run, G. The default value for the population size, M, is 4,000 for this book
and the default value for the maximum number of generations to be run,
G, rs51 (i.e., generation 0 with 50 additional generations)' Depending on
the complexity of the problem, populations of L,000, 8,000, or 16,000 are
used for some problems. A few problems are run for only 2T generations
because of time constraints.
Introduction to Automatically Defined Functions - The TWo-Boxes Problem
Lr addition to the two major parameters for controlling runs, 19 additional
minor parameters control runs of genetic programming. The default values for
the minor parameters are detailed in appendix D.
The fifth major step inpreparing to use genetic programming involves specify*g the method for designating a result and the criterion for terminatirg a
run. The termination criterion for a problem is triggered either by running
the specified maximum number of generations, G, orby the satisfaction of a
problem-specific success predicate by atleast one program in the population.
The success predicate for this problem is that a program scores the maximum
number of hits (i.e., 10). This occurs when each of the 10 values retumed by a
genetically evolved program for the 10 combinations of the six independent
variables comes within 0.01 of the associated value of the dependent variable, D. h:I other words, this success predicate considers an approximate solution to be a satisfactory result for this problem. If we had specified that the
success predicate consisted of achievement of a value of standardized fitress
of exactly 0, then only an exact solutionwould be considered to be a satisfactory result. We designate the best-so-far individual as the result of a run of
genetic programming.
The function set for a problem should be chosen so that it is capable of
solving the problem" Mathematical expressions composed of additiory subtractiory multiplication, and division are certainly capable of approximating
a given set of numerical data. Since this problem requires tinding a program
that approximately fits the grven data (reflected by the zuccess predicate merely
requiring the scoring of 10 hits), it is reasonable to believe that the function
set, f, satisfies the sufficiency requirement. Howeve{, in general the sufficiency of a function set depends on both the function set and the success
predicate for the problem. For example, if the success criterion for the problem required attainment of a value of standardued,fifiress of exactlyzero
Jthusrequiri^8*algebraicallycorrectsolutiontotheproblem),thenwewould
be less certain that the function set, f, satisfies the sufficiency requirement
(absent additional knowledge about the characteristics of the source of the
given data).
Table 4.2 summarizes the key features of the two-boxes problem when
automatically defined functions are notbeing used. We call this table (and the
15 similar tables in this book) the tableau without ADFsfor the problem. Each
such tableau withoutADFs summ anzesthe main choices madl while applying the five major preparatory steps of genetic programming. A supplementary tableau with ADFs will be presented later.
The second and third rows of eachtableauwithoutADFs correspond to the
first and second major preparatory steps for genetic programming and summanze the choices for the terminal set and function set, respectively, for the
problem. The choice of the terminal set and function set determines whether
a wrapper (shown in the eighth row) is needed for a particular problem.
The fourth through seventh rows of each tableau without ADFs relate to
the lhfud major preparatory step and present the choices made conceming
the fibress measure for the problem.
Chapter 4
Thbleau withoutADFs for the two-boxes problem.
Objective: Find a program that produces the observed value of the
single dependent variable, D, as its ouput when glven
the values of the six independent variables as input
Terminalset
withoutADFs:
The six actualvariables of the problem, L0, w0, H0,
1 --- Ll_, wr, ancl Hl.
Function set
withoutADFs:
tr -r* and %.
Fitness cases: L0 combinations of random integers between L and 10
for the six independent variables L0, w0, H0, L1 , 1NL,
a --a
ano HI.
Raw fitless: The sum, over the 10 fitness cases, of the absolute value
of the error between the value retumed by the Program
and the observed value of the dependent variable'
Standardized fibress: Same as raw fitness.
Hits: The number of fitness cases (out of 10) for which the
absolute value of the error is less than 0.01 (the hits
criterion).
Wrapper: None.
Parameters: M=4,000.G=51.
Different fitress cases are chosen for each run'
Success predicate: A program scores the maximum number (i'e', 10) of hits'
The ninth row of each tableau without ADFs corresponds to the fourth
major preparatory step and presents the control parameters for the problem'
It i, .o* ilways includes the two maior Parameters of population size' M'
and the maximum number of generations to be run, G' The 19 minor nurnerical and qualitative control parameters are.generally not specifically mentioned
in the tableau unless they differ from the default values (appendix D)' For
this particular problem, a different set of randomly created fitness cases is
created for each seParate run.
The tenth ro* oi each tableau withoutADFs relates to the fifth major Preparatory step. The method of result designation used throughout this book is
the best-so-far method. The termination criterion used throughout this book
is a disjunctionbased on completing the maximum number of generations to
Ue run, G, arrrd satisfaction of a problem-sPecific success Prcdicate' Only the
success predicate is specifical$ mentioned in the tableau.
4.3 RESULiTS WITHOUT ADFs
Now that we have comPleted the five major steps for preparing to use genetic
programmingr we will describe a run of genetic Programming without
automatically defined functions for the two-boxes problem.
Introduction to Automatically Defined Functions - The TWo-Boxes Problem
Arun of genetic programming for this problem starts withthe creation of a
populatiorrof 4,000 random computer programs, each composed from the
ivailable functions (*, -, *, and %,) from the function set, f, andthe available
terminals (t 0, WO, H0, L1, W1, and H1-) from the terminal set, { The process of
creating the initial random generation is specified by means of a computer
program in appendix E of this book and described in detail tn Genetic Progr amming (section 6.2).
The 4,000 randomly generated individuals found in the initial generation
of the population are, in effect, a blind random search of the space of computer programs representing possible solutions to this problem.
The results of such a blind random search are not very good. The worst-ofgenerationprogram in the population for generation 0 has the enormous error
(fitress) of 3,093,62-3. This is an average deviation of 309,362 between the value
producedby this computer program and the correctvalue of D (whose average magnitude is only 173.4in table 4.1). This individual is shown below:
(* (* (+ (* Hl W0) (* Hl W0)) (+ (* H0 Wl) (* H1 H1-))) (- (* (Z L L
Ll) (- L0 w1) ) (* (* w1 wl) (- wO Hl) ) ) )
Howeveq, even in a randomly created population of programs, some individuals are better than others.
The aaeragefitness of thepopulation as a whole (the mean) for generation 0 is
L,I95,092 (orly about a third of the fihress of the worst). The mean for generation 0 can reasonably be viewed as abaseline value for a blind random search
of the program space of this problem.
The fitress of the median (2,000th best) individual of the population for
generation 0 has a fibress of 1571..8 and is
(% (- Ll w0) (+ (* Wl H0) W1))
which is equivalent to
h-Wo
Wr +WrHo'
The fitness of the median individualfor this problem is considerablybetter
(i.e., smaller) than the average fitness of the population as a whole because
the average is significantly raised by a few extremely unfit individuals in the
poorest percentiles of the population.
The best individual from generation 0 has a standardized fihress of 783.
The average error between the correct value of D and the value of the output,
D, produced by this program is 78.3. This average error is about 45% of the
average magnitude (Lft.a) of the 10 values of D in table 4.1., so the performance of this best individual from generation 0 must be viewed as being very
bad; nonetheless, this error is better than the error produced by the other
3,999 random individuals in generation 0.
Thebest-of-generation program in generation 0 of the population (hereafter
often referred to as thebest of the specified generation) is
(* (- (- W0 Ll) (- wl H0) ) (+ (- H0 H0) (* H0 L0) ) ) .
64 Chapter 4
This programhas seven functions and eight terminals and thus has 15 points.
It is equivalent to
Hol(wo + Ho-w,- \) -
This expression bears little resemblance to the correct mathematical
.*pr"rrion for solving the two-boxes problem. Like many randomly gen-
"rut"d
individuals, this program is blind to several of the independent
variables which are needed to solve the problem correctly. This individual
does not contain u1-.
Throughout this book, we frequently display individuals and present
statistics from generation0in order to show theprohibitive difficulty of finding the solution to the problem at hand merely by means of blind random
search and in order to give the reader a sense of the general aPPearance of
rand-om computer programs that are composed of the available primitive
functions and terminals for the particular problem domain.
The Darwinian reproduction operation is then applied a certain number of
times to single individuals selected from the population on thebasis of their
fitress (with reselection allowed). In additiory the genetic crossover operation is then applied to a certain number of pairs of parents selected from the
current population on the basis of their fitness (with reselection allowed) to
breed a new population of programs. Throughout this book, the number of
reproduction operations p"tfor-"d for each generation is equal to p1 - 10"/"
of the population size 1i.e., +00 for a population of size4,000). The number of
crossover operations is equal to pc - 45"/o of the population size (i.e., 1,800
crossovers involving 3,600 individuals and producing 3,600 offspring)'
The vast majority of the offspring in the newly created generation 1 are'
like their parents from generation 0, highly unfit. Howevel, some of the offspring may be slightly more fit than their parents'
-
Frgure 4.2 preslnts tkre fitness cuTues for this run showi.g, by generation,
the stand ardized.fibress of thebest-of-generationPfogfam, the standardized
fitress of the worst-of-generation Program, and the average of the standardized fitress for the population as a whole. The figure starts at generation 0
and ends at the generation on which a 1o0%-correct solution was evolved on
this particular run (i.e., generation 11). Standardized fibless is shown here on
a logarithmic scale since the stand arduedfihress of boththeworst-of-generation program and the average of the standatdizedfihress for the population
as a whole are typically very large for problems of symbolic regression. Since
the standardized fitness of the 100%-correct program evolved in generation
Lj. is zero, the final point is not plotted on this logarithmically scaled graph.
As a run of genetic programming continues from generation to generation'
we typically observe a generally monotonic improvement (i'e', a decrease) in
the average standardized fibress of the population as a whole and in the standardized. fitr:ress of the best-of-generation individual. For example, the standardized fihess of the best-of-generation program progressively improves to
778,510,138,\17,53, and 5L between generations 2 and 7 of this run.
65 Introduction to Automatically Defined Functions - The TWo-Boxes Problem
Worst of
+ Average
-+ Best of Generation
05to
Generation
Figure 4.2 Fitness curves for the fwo-boxes problem withoutADFs.
Lr generation 8, the standardized fitress of the best-of-generation program
improves to 4.44,so the average error is now only 0.444per fitress case (versus 78.3 per fifiress case for the best of generation 0). This average error is only
0 -2% of the average magnitude of the 1 0 values of D in table 4. L. This program
has 27 points and is
(- (- (* (*w0H0) t9; 1r, (*L1 H1) W1))
(z (+ w0 L0) (- (- L0 W1) (+ (+ wl L1_) (* r,1 w1))))).
This individual is equivalent to
woHoT -wflth- ' - Y:* 3
4-2w,-h-hwr
As can be seen, the first two terrns of this expression correspond to what we
know to be an algebraically correct solution to this problem, while the third
term is an extraneous and erroneous term.
tr generations 9 and 10, standardized fibress further improves to L.Lo and
0.65, respectively.
In generation 11, the best-of-generation program achieves a standardized
fibress of 0. This 11-point program is
(- (* (* W0 H0) r,O; 1* (* L1 Hl) W1))
which is equivalent to
woHob -Wflrh.
This program (which we can recogni ze asan algebraically correct solution to
the problem) scores L0 hits because its error is less than 0.01 (the hits criterion)
for all 10 fihress cases. Aprogram that scores L0 hits satisfies the success predicate of this problem and causes this run to be terminated at generation 11
10
10
9
10
8
10
'7
ta 10 (t)
(D6
F10
t= 10
H 4
Klo
E10
Ero
A10
0
10
66 Chapter 4
Figure 4.3 lO0%-correct best-of-run program from generation 11 for the two-boxes problem
withoutADFs.
(rather than conti.*i.g on to generation 50). This best-of-generation program
is, therefore, also the best-of-run program and the best-so-far program.
Figure 4.3 shows this 100%-correctbest-of-run individual from generation
10 as a rooted, point-labeled tree with ordered branches.
The best of generation 8 differs from the best of generation LL by the erroneous and extraneous subtractive term. This similarity does not mean that
the best of generation 8 is necessarily one of the (up to) eight ancestors of the
best of generation 1L; a genealogical audit trail canbe used to determine this.
We define theaariety of a population at a given generation to be the fraction
of the programs in that population that are different from every other program in the population. Variety is determined by using the LISP function
EQUAL, which considers two programs tobe the same if theyhave exactly the
same tree structure and the exactly same labelingof thepoints of the treewith
functions and terminals. A value of variety of 100% indicates that all programs in the population are different.
Figure 4.4 shows theanriety curre for the population for the two-boxes problem. Variety starts at 1.00 at generation 0 because duplicates are eliminated
when the initial random population is created. Variety fluctuates around 0.85
for most of this particular run.
The solution that evolved in this particular run of this simple problem
happened to be an algebraically correct and parsimonious solution to the
problem at hand. Howeveq, genetic prografiuning does not, in general, produce such solutions in problems of symbolic regression. Instead, genetic
programming typically evolves relatively large programs that are good
approximations to the data.
4.4 THE IDEA OF SUBROUTINES
A human programmer writing a program for the two-boxes problem would
probably notice the symmetry and regularity of the mathematical expression
woHoq -WFrk.
67 hrtroduction to Automatically Defined Functions - The TWo-Boxes Problem
c) 'E 0.s
o Generation
Figure 4.4 Variety curve for the two-boxes problem without ADFs'
This expression contains a multiplication of three mrmbers in two places. The
physical interpretation of this regularity is as a computation of volume.
Regularities and symmetries in a problem environment can often be
exploited in solving a problem. An altemative way of writing the program
for the two$oxes problem involves first writing a subroutine (defined function, subprogram, procedure) for the cofirnon calculation and then repeatedly calling the subroutine from a main program. The six lines of code below
in the LISP programming language contain a two-line defined function and a
one-line main program:
t ;;;- definition of the three-argument. function "volume"-
2 (progn (defun volume (argrO argl arg2)
3 (values (* argO (* argl arg2) ) ) )
4 ;;;- main prograrn for computing the difference
5 ;;; of t.wo vo]umes6 (values (- (volume L0 W0 H0) (volume L1 W1 H1-))))
Lines l-, 4 and 5 contain comments (indicated by semicolons) informing us
that a subroutine called volume follows on lines 2 and 3 and that a resultproducing main program follows on line 6.
Lines 2 and3 contain the definition of a function (called a de fun in LISP).
A defun declaration does four things.
First, the defun (line 2) assigns a name, volume, to the function being
defined. The name permits subsequent reference to the function by a calling
program (line 6).
Second, the de f un (line 2) identifies the argument list of the function being
defined, hr this defun, the argument list is the list, (argO argil arg2),
containing the three dummy aariables argO, argL, and arg2. These three
dummy variables (also known as formal parameters) are entirely local to the
function being defined (lines 2 and 3) and do not appear at all in the resultproducing main program (line 6).
Ihird, the defun contains a body (Iine 3) that performs the work of
the function. The work here consists of the multiplication of the three
dummy variables , argO, argI, and arg2, using two invocations of the
68 Chapter 4
two-argument primitive function of multiplication (*). The body of the
function being defined does not have access to the actual variables of the
problem, L0, W0, HO, L1, W1, and H1. Instead, it operates only with the
three dummy variables that are local to the function definition.
Fourth, the defun identifies the value to be retumed by the function. Lr
this example, the single value to be returned (i.e., the product of the three
dummy variables argO, argL, and arg2) is highlighted with an explicit
invocation of the values function (line 3). LISP Programmers do not ordinari$ use the values function in this overt manner; howeve{, we use it
throughout this book to highlight the value(s) being retumed by each defined function (and the result-producing main program). some programming
languages have a statement called return for identify-g the value to be
refumed by a subroutine; others require the progrerruner to assign the value
tobe refumed to a special variable with the same narne as ttre function.
Line 6 contains the result-producing main program. The main Program
calls the defined function volume twice and then assembles the values
retumed by the two invocations of the defined function vo lume. Specifically,
the assembly consists of subtracting the two values retumed by the function
volume. The main Program does not have access to the dummy variables
arg1, argl, and irg2; they are entirely local to the defined function
vo lume. Instead, the main program calls the function vo lume using the acfual
variables of the problem. \,Vhen the main program calls vo I ume the first time'
the three dummy variables , argO, argL, Nrd arg2, are instantiated with the
particular values, L0, W0, and HO, respectively, of lhe actual aariables of the
problem. Then, when the main program calls volume the second time' the
three dummy variables are instantiated, with the values, LL, W1, and H1'
respectively. Finally, the body of the main proglam Pelforms the work of subtracting the trvo volumes. The single value to be returned by the main program in tlrr" 6 is highlighted with an explicit invocation of the values function'
The Com*otrlmp function progn evaluates each of its arguments
sequentially and returns the result of evaluating its last argument' When
the six lines above are evaluated in LISR the progn on line 2 causes the
sequential evaluation of the function- defining br anch (lines 2 and 3) and the
,rrilt-producing branch (line 6). The progn starts by evaluating its first
argument, namely the function-defining branch. When a d e f un is evaluatea in LISP, the function involved becomes defined and the def un returns
the name (i.e., volume) of the function just defined. since the progn returns
only the result of the evaluation of its last argument, the value returned
by the defun in the first branch is lost (inconsequentially). The progn
now evaluates its second. branch, namely the result-producing branch' The
result-producingbranch calls the now-defined function volume twice and
does a subtraction. Since this second branch is the last argument of the
progn, the value returned by the overall six-line program consists of the
numerical value returned. by the values function associated with the
result-producing branch. For this reason, the result-producing branch may
also be referred to as tl:re aalue-returning branch'
Introduction to Automatically Defined Functions - The Two-Boxes Problem
(ARGO ARG1 ARG2)
Figure4.5 Anoverallprogramconsistingof afunction-definingbranchforthefunctionvolume
and a result-producing branch that computes the difference between the volumes of two boxes.
Figure 4.5 shows the overall structure of the above six-line program for the
two-boxes problem. The function-defining branch (containing the defun)
apPears in the left part of this figure and the result-producing branch (the
main program) appears on the right. The bodies of the two branches appear
below the horizontal dotted line.
The above illushative defun for volume has thrree dummy variables, retums
only a single value, has no side fficts (i.e., does not change the state of any
system), and refers only to its three local dummy variables (i.e., it does not
refer to any of the actual variables of the overall problem). Howeveq, in general defined functions may have any numb"t of arguments (including no
arguments), may refurn multiple values (or no values at al7), mayor may not
perform side effects, and may or may not explicitly refer to the actual variables of the overall problem.
Different names are used to describe the idea of a defined function in dif- ferent Programming languages. Irr FORTRAN, a subroutine is called a func- tion ot subroutine depending on whether or not a single value is returned. In Pascal, a subroutine is labeled as a function or procedurebased on this same distinction' In LISP, no such distinction is made and all subroutines are called
functions and defined by means of a defun. Reusable code can appear in computer programs in several other ways. For example, in some program_ ming languages, such as FORTRAN, single-valued functions consisting of onlyasimplearithmeticcalculationmaybedefine d,asanin-Iinefunctionwithin
a Program without creating an external subroutine or function. These func- tions can then bereferenced repeatedly within the particular program or subProgram in whidr they are defined. h LISR the 1 e r construction can be used
to bind the value retumed by some expression to a variable that can then be repeatedly referenced within the region of a program delineated by the let. In additiory the f let and f abels constructions can be used to establish
local definitions of functions.
When a programmer writes a subroutine for volume, the function definition is usually not composed of a particular combination of the actual
70 Chapter 4
variables of the problem. Instead, a function definition is parameterizedby
dummy variables (formal parameters), such as arg0, argI, and arg2.
The function definition is a general, reusable method for computing volume. The dummy variables are usually instantiated with a different combination of the actual variables of the problem on each occasion when
volume is invoked. Howeveginspite of the different instantiations, vol -
ume carries out its work in terms of the dummy variables in precisely the
same way on each occasion. For example, vo l ume may be called with f, 0 ,
w0, and H0 on one occasion by
(volume L0 W0 H0) .
In additiory volume may be called with LL,vIl-, and ut on another occasion
by
(volume L1 W1 Hl-) .
In add.itioru the dummy variables can be instantiated with expressions consisting of a composition of functions and terminals, rather than mere terminals. For example, volume might be called with ( - L0 L1) ,
( - w0 w1 ) ,
and (- HO H1) as its arguments bY
(volume (- L0 L1) (- W0 W1) (- H0 H1)) '
However, in spite of the different instantiations, volume multiplies the
current value of its three dummy variables and returns that product as its
result.
what is gained by writing the program for the two-boxes problem using a
defined function?
First, once the functionvolume is d.efined., it may thenbe repeatedly called
with different instantiations of its arguments from more than one place in the
main program. Defined functions u*ploit th. underlying regularities and qrmmetries oia problemby obviatingthlneed to tediouslyrewrite lines of essentiatly similar code. In tilis
"*u*pi",
we first call the function volume with t'0'
W0, and H0 as instantiations of it, tht"" dummy arguments and we then call it
with LL , W1- ,and U t . Of course, a mere ,two calls to a function whose work is
as trivial as volume d.oes not create a compelling need for a defined function'
However, there is a considerable advantage to a defined function when a
more complicated calculation mustbe performed numerous times'
second, the use of function definitions and calls may improve the parsimony (i.e., decrease the size) of an overall computer program' One of the
*uy, by which parsimony may be measured is in terms of the size of the
overall progru*li.e., the number of points in the parse tree of the program)'
The two illustrative programs above for the two-boxes problem do not exhibit
any advantage in terms of parsimony since the simple main Prcgram
without the de f un (e.g., the program evolved by generation 1t in section 4'3)
contains fewer points-than the combination of the main Program and the
def un for volume. However, if the work of the def un wele less trivial, there
generally is a considerable improvement in parsimony of the overall program from the use of a defined function'
Introduction to Automatically Defined Functions - The Two-Boxes Problem
Third, if automated leaming is involved, the ability to extract a reusable
subroutine may obviate the need to releam the same behavior or concept on
each separate occasion that it is needed. Function definitions may reduce the
computational burden required to yield a solution to a problem with a satisfactorily high probability.
Fourth, the process of defining and calling a function, in effect, decomposes a glven problem into a hierarchy of subproblems. br the two-boxes
problem, the decomposition consists of identifying the subproblem of comPuting volume. This subproblem is solved by multiplyug three numbers.
The solution to the overallproblem is obtainedby calling the subroutinewith
two different instantiations of its three dummy variables and assernbling the
results by subtraction.
In practice, a human programmer might or might not choose to encode a
solution to this particular problem using a subroutine because the common
calculation is so simple (merely the product of three numbers), because there
are only two invocations of the corunon calculatiory and because the main
Program is so simple (merely a subtraction of the result of the two calls to the
subroutine). Howeve4, if the repeated calculation were more substantial (e.g.,
solving a quadratic equation or computing a Thylor series approximation for
the exponential function), virtually every prograrnmer would choose to write
a subroutine, rather than tediously rewrite the code for the cornmon calculation. Furtherrnore, when an overall program is large many programmers
prefer to write subroutines to modularizetheir programs even if no calculations are repeated.
When the main program is executed, the subroutine vo 1 ume is called twice.
Each of the two-argument multiplications contained in the subroutine is
executed twice so that there is a total of four multiplications. Note that this
numbeq, fouq, is the same whether or not a subroutine is used. That is, decomposing the problem into subroutines and then repeatedly calling the subroutines does not, in itself, reduce the totalnumber oielementary oferations that
mustbeperformed in order to execute an already-known roi.rdorl to a problem. [r fact, because calling a subroutine in most programming ranguages
usually introduces a certain number of additional operatiorc ul overhead,
there is usually a slight increase in the total number of machine instructions
performed.
Nonetheless, one beneficial effect of writing subroutines is the gener- ally smaller size of the overail program required to solve the problem.
Such savings are particularly significant when the subroutines are nontrivial' Another beneficial effect of writing subroutines is that it may take the human programmer less time and effort to create the program to solve
the program. One can sPeculate that the analog of this latter benefit in the
domain of automated problem-solving is that it might take less computation to learn the solution to a problem with subroutines than without them.
-
The three-step hierarchical problem-solving process described in chapter
3 is involved whenever a programmer chooses to write a subroutine.
72 Chapter 4
4.5 THE IDEA OF AUTOMATICALLY DEFINED FUNCTIONS
Genetic programming provides a way to bring the benefits of the three-step
hierarchical process described in the previous section to bear on solving
problems.
Genetic programming provides a way to solve a subproblem (i.e., the second
step of the topdown apprrrach). But what about the other steps of this thrce€teP
problem+olrrirg process? How are they to be performed in an automated and
domain-indepmdent way? And, even if the individual stePs can be performed
separately, how are they then to be integrated urith one another?
Orre answer appears to be to automate the entire Process of writing subroutines and the programs that call them. Figure 4.5 showed an overall progamconsisting of a defined function called volume and a calling Program
thut .o-puted the difference between the volumes of two boxes. Our approach
is to use genetic programming to simultaneously evolve functions (automatically aefinea functions) and calling programs during the same run' \Atrhen we
talk about "automatically defined functions," we mean that we intend that
genetic progranuning witt automatically and' dynamically evolve, by means of
nafural selection and genetic operations, a combined structure containing
automatically defined functions and a calling Program capable of calling the
automatically defined functions. During the run, genetic Programming will
genetical$breed a population of Programs, each consisting of a definition of
u fu.r.tior-, definition in its function-defining branch and a main program in
its resultproducing branch. The bodies of both the function-defining branches
and the result-proiucing branch are each determined by the combined effect'
over many generations, of the selective Pressure exerted by the fihress measure and Uy Ut" effects of the Darwinian reproduction and the crossover
operationr. th" function defined by the function-defining branch of a
particular individual in the population is available for use by the result?Ioducing branch of that individual. The manner and the number of times' if
any, that the automatically defined function of an individual in the population will actually be calei by the result-producing branch of that particular
individual is ,roi predetermined, but is instead determined by the evolutionary process.
The concurrent evolution of functional subunits and calling Programs would
enable genetic Programmin gto rcalue the entire three-step hierarchical problem-solving p.o."r, described above, automatically and dynamically within
a run of genetic Programming'
The progru-l' fignt.4.5 is an example of a constrained syntactic structure (Cen rii, Programming, chapter 19). Each program in the population
contains one fun-ction-deiiningbranch and one result-producing branch'
The result-producing branch may call (but is not required to call) the
function-defining branch.
Figure 4.6 shows the overall structure of an individual program consisting of one function-defining branch and one result-producing branch' The
Introduction to Automatically Defined Functions - The TWo-Boxes Problem
Bodv of ADF0
Function Definition
Bodv of Result
Producing Branch
Figure 4.6 An overall program consisting of one function-defining branch and one resultproducing branch.
function-defining branch appears in the left part of this figure and the
result-producing branch appears on the right.
This overall program has eight different types of points. The first six types
ateinaariant andwe place them above the horizontal dotted line in the figure
to indicate this. The last two types are noninaariant and constitute the bodies
(work) of the two branches; they appear below the horizontal dotted line.
The eight types are as follows:
(1) the root of the tree (which consists of the place-holding progn
connective function),
(2) the top point, de f ury of the function-defining branch,
(3) the name, ADFO, of the automatically defined functiory
(4) the argument list of the automatically defined functiory
(5) the values function of the function-defining branch identifying, for
emphasis, the value(s) to be returned by the automatically defined
function,
(5) the values function of the result-producing branch identifyin g, for
emphasis, the value(s) to be retumed by the result-producing branch,
(7) the body of the automatically defined function ADFO, and
(8) the body of the result-producing branch.
Each overall program in the population has its own result-producing branch
and its own function-defining branch. Note that each reference to an automatically defined function in the result-producing branch of an overall program in the population refers to the particular automatically defined function
belonging to that overall program (and not to any other identically-named
automatically defined function beronging to some other program in the
population).
74 Chapter 4
If more than one value is to be returned by the overall Program, there are
multiple arguments to the values function of the result-producing branch
(poini 6 in figure 4.6). That is, the result-producing branch consists of multiple subbranches under the values function. When the progn evaluates its
last argument (i.e., the values at point 6 associated with the result-producing branch), the multiple values retumed by the subbranches of the resultproducingbranch are retumed as the output of the overall program'
The result-producingbranch fypically contains the actual variables of the
problem. The actual variables of the problem usually do not aPPear in the
function-defining branches, although they may be made directly available to
suchbranches.
In general, a program may contain more than one function-defining
branch. The numbei of different tyPes of points in programs involving
automatically defined functions is always at least eight (as shown in figure 4.6;however, there may be more than eight wpes if there is more than
one function-defining branch. If a ptogram has more than one functiondefining branch, each such branch may potentially refer to the others' For
examp[, a function-defining branch might be permitted to refer hierarchically to any function that has already been defined by an earlier function-dlfining branch. Potentially, a function-defining branch may
recursively refer to itself. However, we do not discuss recursion in this
book.
\Mhen storing a Proglam having the above structure in a compute1 we do
not actually create u fmp S-"xpression containing the invariant points of types
L through 6 (i.e., the point, uborr" the horizontal dotted line in figure 4'6)' In
pracuci only the bodi", of the function-defining branch(es) and the bodies of
thu r"s.rlt-producing branch of an overall Proglam (i.e., the points of types 7
and 8 in figure a.6) Jeactually created and explicitly stored. These bodies are
gathered together as arguments to a top-level LIST function. The overall prosam represented by trr" nrt of bodies created by this LIST ftrnction is then
Lt"rp."Ld in a manner semantically equivalent to the structure described
abovl (i.e., as if all the points above the horizontal dotted line wele present)'
Appendix E presents details on the implementation of automatical$ defined
functions on a comPuter in LISP'
As willbe seeo an automatically defined function can
. perform a calculation similar to that which a human Plogrammer
might use,
. perform a calculation unlike anything a human Plogrammer would
ever use,
. redundantly define a function that is equivalent to a primitive function that
is already present in the function set of the problem,
. ignore some of its dummy variables,
. be entirely ignored by every potential calling branch,
75 Introduction to Automatically Defined Functions - The TWo-Boxes Problem
' define a constant value (i.e., a value that is independent of all of the dummy
variables and any othervariablesthatmaybe available to the automatically
defined function),
' return a value identical to one of the dummy variables (so that the
automatically defined function redundantly defines a terminal that is
already present in the terminal set of the problem), or
' call another automatically defined function with a subset of, or a pennutation of, its dummy variables.
The need for reusable subroutines appears in every area of artificial intelligence and machine learning, and neural networks.
M*y existing paradigms for machine leaming, artificial intelligence, and
neural networks automatically and dynamically define functional subunits
during runs (the specific terminologybeing, of course, specific to the particular paradi$*). Howeve4, automatically defined functions operate differently
than the functional subunits one sometimes encounters in such paradigms.
We illustrate this point with an example from the field of pattem recognition.
Consider the problem of learning to recognize a pattern presented as an
array of pixels in which the sarne feature appears in two different places within
the overall pattem. Specifically, suppose a feature consisting of a vertical line
segment within a &by-3 pixel region appears in both the upper middle of a
9-by-5 affay of pixels and the rower middle of the same arrav
Figure 4.7 shows the 3-by-3 pixel feature defining a vertical line segment.
Figure 4-B shows the 3-by-3 feature from figure 4.7 rr-two different locations within the overall 9-by-5 array of pixels. The two occurrences of the
3-by-3 feature are framed in the figure.
-
M*y existing paradigms from the fields of artificial intelligence, machine
leaming, and neuralnetworks are capable of leaming torecognize the overall
pattem described above. These paradigms are able to efficieitty discover the 3-by-3 feature among the nine pixels pn,ptz,ptz,pzr,pzz,pn,ps1,,p32,*d p* in the uPPer middle of the 9-by-5 arcay.They arsalso abte to^inJup""au"try
rediscover this same feature among the nine pixels pst, pm, pse, pet, pez pos,
Pru prz, and pre, in the lower middle of the afiay. But most existing paradigms generally do not provide a way to discov.r tLt, conunon featwe just
once' to generalize the detector of the feature so that it is not hardwired to particular pixels (but is, instead, parameteraed),and then reuse the general- ized feafure detector in a parameterized way to recognize occurrences of this
conunon feature in different 3-by-3 pixel regions within the overall array.
Specifically, let us consider the way thit neural networks (Rumelhart,
Hinton, and Willi ams1986;Hinton 1989) and genetic classifier systems (Holland 1986; Holland et al. 1986) might treat this problem of pattem recognition.
We first consider neural networks.
Figure 4.9 shows the 9-by-5 array of pixels, two occurrences of the same
3-by-3 feature, and two neurons,
"u.h
capable of recogn izingthe 3-by-3
feature.
76 Chapter 4
Figarc 4.7 A3-by-3 pixel feature consisting of a vertical line segment.
Figure 4.8 TWo identical $by-3 pixel features in a 9-by-5 array of pixels'
Various different neural network architectures and training ParadiSms can
be successfully used to train a 45-input neural network to recognize the }-by3 feature located at pixels Pn,Ptz,ptz,pzt,Pzz,Pn,Pst,Pzxand pes within the
9-by-5 array of pixeis. Theleaming necessary to recognize this 3-by,3 feature
mifnt be embodied in the simple subassembly consisting of a single neuron
shown at the top right of figure 4.9. There are nine weighted connections
between this neuronand its nine inputs 'pt1'ptz'ptt'pzt'Pts'Pn'Pzt'Pzz' ar,td
pes.Negative weights (-1) are assigned to the connections from the pixels prr'
pts,pzt,pn,pzt,*a p* and positive weights (+L) are assigned to the connec-
^tt;
f.o- pi*"l"praPo,uttd pur. The sum of the nine products of the weights
and inputsis +3. Since this sum equals this neuron's threshold of 3, the neuron emits an output of +L indicating recognition of the 3-by-3 feature' Thus,
the subassemblyconsisting of this first neuron and these weights is capable
of recognizing the first occurrence of the 3$y-3 feature.
The neural network can also learn to recognize the occurrence of this
same 3-by-3 feature located at pixels pst, psz, Pss, Pet, pez, pot, Pzt, Pzz, and
Introduction to Automatically Defined. Functions - The Two-Boxes Problem n
\wixt >3
i
\wixi > 3
i
Figure 4.9 TWo neurons recognizing a vertical line segment located in two places in an 9-by-5
array of pixels.
pzs within the 9-by-5 array. The learning necessary to recognize the second occurrence of the feature might be embodied in the subassembly consisting of the second neuron shown at the bottom right of figure 4.9. This
second neuron has nine weighted connections -1 for pst,pss,pot,pos,pzt,
and prz and +L for psz, pez, and pzz). As before, the sum of the weighted
inputs is +3 and equals this neuron's threshold, so the neuron emits an
output of +1 indicating recognition of the 3-by-3 feature. Howeve4 with the
usualimplementations of most existing corurectionistparadigms, this second
set of nine weights would be leamed entirely separately and independently
from the first set of nine weights. This is true even though the "same" 34y-3
feature is involved and even though the same sets of nine weights can recognize the feafure.
In contrast, a human prograrnmer writing a computer program to recognize this 3-by-3 feature would write a general nine-argument subroutineTusf
once and then call the reusable subroutine twice (instantiating the first call
with the actual variables pflr ..., pgas arguments and instantiating the second call with the actual variables pst, ...,pzo). The writing of a single reusable
subroutine by the human corresponds to the neural network doing its leaming just once; embodying its leamin g n a subassembly; making a copy of the
already-leamed subassembly; positioning the copy in a new location in the
overall neural nef connecting nne diffuent paels as inputs to the copy of the
subassembly it its new location; and consolidating the outputs of the two
subassemblies in the same output neuron (not shown).
When a set of weights is discovered enabli^g
"
particular neuron in a neural network to perforn some subtask (e.9., recognize the 3-by-3 features above,
detect an edge, perform the behavior of the exclusive-or function, etc.), the
78 Chapter 4
training process canbe viewed as a Process of defining a function (i'e', creating u function taking the values of the specific inputs to that neuron as argumlnts and producing a binary output signal whose value is determined by
whether or not the threshold of the neuron is exceed,ed). Moreover/ a Process
of abstraction occurs when this neural function is used in that all other inputs
to the neural network that are not connected (or are colrnected with azero or
negligible weight) to the neuron involved Play no role in computing the value
of the neural function being defined (i.e., in producing the ouput signal)'
The neural function thus defined differs from the automatically defined
functions that we have been discussing. The neural function is called only
once from within the neural network and it is called only by the specific part
of the neural net where it is created. Moreover, this neural function is called
only with the one particular fixed set of inputs that happens to be hardwired
to a specifi. ,,u,rror,. Conceivably the subtask performed by the neuron might
be useful elsewhere in the neural network' That is, the same set of weights'
thresholds, and, biases that enable the neural function to perform its calculation might be useful elsewhere in the neural network to perform a similar
calculation. However, the usual implementations of most existing paradigms
for training neural networks do not provide a wa{ to reuse the set of connection and *"ignt that are discovered in one part of the network in other parts
of the network where a similar subtask mustbe performed on a different set
of inputs. That is, there is no propagation of a generalized structure; there are
no dummy variables that are capable of being instantiated with different sets
of inputs; there is no reuse of a useful neural function in more than one place'
Instead, the training algorithm for the neural net has to independent$ rediscover the useful combination of weights, thresholds, and biases for every
nelron that needs to perform the same calculation on its particular inputs'
The above descripUon greatly simplifies the way most modern neural
networks work. For example, ulubassernbly for detecting a feature would
typically be far more .o*pli.uted than one neuron in an acfual neural network; the multiple neurons involved would probably be arranged in layers
creating a hierarchy; the weights would probably be floating-q?int values'
rather than just -1 and +L; and sigmoidal signals would probably be used'
Nonetheless, the above example correct$ makes the point that a particular
subassembly for recognizing a feature is usually hardwired to a particular
nine pixels in the rrrrul implementations of the most popular neural network
architectures and training paradigms'
The field of neural networks is vast and some researchers have attempted
to deal with the discovery of modular features in neural networks' For
example, the neocognitron (putcustrima and Miyake l98l;Fukushima' Miyake'
and Thkatuki 1,983; Fukushim a,1989)is a multilayer neural network that can
recognize a displaced or distorted pattem. [n some neural network architecfures, some -"ightt are conunon to grouPs of neurons belonging to a receptor field, so thuiromething that is learned. in one part of the field is available
to the other neurons of the grouP'
79 Introduction to Automatically Defined Functions - The TWo-Boxes Problem
p00 p10 p20 p30 p40 p50 p60 p70 pEO
***** *010* *010* *010* *{<*** ***** {r**** tF{<*** *****
Figure 4.10 Condition part of classifier system rule for recognizing the first occurence of the
3hy-3 feature.
Le Cun et aI. (1990) describes tliswetght sharing architecture as follows:
A fully connected network with enough discriminative power for the task
would have far too many parameters to be able to generalize correctly. Therefore a restricted connection-scheme must be devised, guided by our prior
knowledge about shape recognition. There are well-known advantages to
performing shape recognition by detecting and combining local features. We
have required our network to so this by constraining the corurections in the
first few layers to be local. hr addition, if a feature detector is useful on one
part of the image it is likely to be useful on other parts of the image as well.
One reason for this is that the salient features of a distorted character might
be displaced slightly from their position in a typical character. One solution
to this problem is to scan the input image with a single neural that has a local
receptive field, and store the states of this neuron in corresponding locations
in a layer called afeature map.Tl:'is operation is equivalent to a convolution
with a small size kernel, followed by a squashing function. The process can
be performed in parallel by implementing the feature map as a plane of neurons whose weight vectors are constrained to be equal. That is, units in a
feature map are conskained to perform the same operation on differentparts
of the image. An interesting side-effect of this weight shnring techrnque, already
described in Rumelhart, Hintory and Williams 1986, is to reduce the number
of free parameters by a large amount, since a large number of units share the
same weights. ... In practice, it will be necessary to have multiple featuie
maps, extracting different features from the same image.
In addition, recent work on celiular encodings (Gruau 1992a,1992b,I993a,
t993b,1994a,1994b; Gruau and Whitley L993a,I993b) has applied genetic
Prograrnming to evolve neural networks that exploit regularities in the problem environment (see subsection F.4.1 in appendix F). Nonetheless, the above
example correctly reflects the usual implementation of the most popular neural network architectures and training paradigms.
We now consider genetic classifier systems.
Cenetic classifier systems learn sets of if-then rules capable of solving problems. An if-then rule consists of a condition (if-part) and an associated action
(thmnart). Agenetic classifier systemwith4s environmentalinputs is, inprinciple, capable of recognizing the two occurrences of the 3-by-3 feature within
the 9-by-5 array of pixels in figure 4.8.
A classifier system might leam to recognize the first (upper) occurrence of
the feature by creating an if-then rule whose condition has the 45 symbols
shown in figure 4.L0. This condition is satisfied when pixels pn,prc,pzr,pzz,
Psr, and Pga are all0 and when pixels ptz,pzt, and paz are all L. The don't care
80 Chapter 4
p00p10p20p30p40p50p60p70p80
***** ***** **d<** *{<*** *{<*** *010* *010* *010* *tF***
Figure 4.11 Condition part of classifier system rule for recognizing the second occurrence of
the 3-by-3 feature.
symbols (*) appearit g i. the other 36 positions cause the values of the other
36 pixels to be ignored. The satisfaction of the condition fires the rule and
triggers an actionthatrepresentsrecognitionof the 3-by-3 feature inthe upPer
part of the 9-bY-5 affaq of Pixels.
Similarty, th" classifier system might learn to recognize the second occurrence of the featureby...utit',ga rulewhose conditionis shownin figure 4'11'
This condition is satisfied when pixels pst,pst,pot,pot,Pn, andpza are all0
and when pixels PsuPaa andpzzare all 1'
A reusable subrouirr. for recognizing the two occurrences of the 3-by-3
feature operates very differently from the way that a classifier system oPeP
ates. The creation of a reusable subroutine would correspond to an ability on
the part of the classifier system to do the leaming just once; to embody its
l"amirrg in a first rule whose condition has nine active positions associated
with pixels pn, ptz, ptz, pzt, Pzz, Pn, Psr, Pza and pas; to make a copy of the
condition of the fitui *L; to add thl copy of the first rule to the classifier
system's set of if-then rules; to modify the copy by movin8 the rule's nine
uttirr"positionsto a differentnine positionswithinthe45 positions so thatthe
modified rule is capable of identifying the feature when it appears in pixels
psr,psz,pss,pot,poapss,pzt,pzz,and pzawithin the 9-by-5 affay; and to give
tt-ru r".ond if-then rule the same action as the first rule.
4.6 PREPARATORY STEPS WITH ADFs
In section 4.2, we applied the five maior preparatory steps of genetic Progfamming to the two-boxes problem and set up the problem as shown in
table 4.2, thetableau without ADFs. We then applied genetic programming
to the problem using a population size of 4,000 and obtained the following
100%-correct solution:
(- (* (*w0H01 r,0) (* (*i,1 Hr) W1)).
This solution can be viewed as a main program whose inputs are L0' W0'
H0, L1-, W1, and H1. The output of the Program is the single value retumed by
the entire exPression.
Before apptyrng genetic programming with automatically defined functions to the two-blxes p.obt"*, it is first necessary to choose the mrmber of
function-defining branches that are to be available and the number of arguments porserredly each automatically defined funcdon. If there is more than
one automaticallydefined function involved, it is also necessary to determine the nature of the references (if any) allowed between the defined functions. This group of architectural choices (required because automatically
g1 brtroduction to Automatically Defined Functions - The Two-Boxes Ptoblem
defined functions are being used) constitutes the sixthmnjor step rnpreparing
to use genetic programming. In practice, this sixth major step is performed
firstwhen automatically defined functions are involved, (i.e., this step is performed before the usual five major preparatory steps).
The sixth major step in preparing to use genetic programming may require
some analysis of the problem. Since the two-boxes problem involves boxes of
dimensionality 3, we decided that 3 is an appropriate choice for the number
of arguments for a defined function for this problem. We also decided to
employ one defined function for this problem. Consequently, each individual
overall program in the population will consist of one three-argr.iment function-defining branch (defining a function named ADFO). For many problems,
considerations of available computer resources (i.e., computer time and
memory) will, as a practical matte4, drive these choices. Chapter 7 summarizes the four methods we usually use to make these architectural choices.
Howeveq, chapters 2L through 25 demonstrates that even these choices can, if
desired, be left to the evolutionary process.
Once the sixth major step has been performed, it is also necessary to
specify the terminal set for the result-producing branch, the function set
for the result-producing branch, the terminal set for each function-defining branch, and the function set for each function-defining branch. That
is, it is necessary to perform the first and second major steps for each branch
of the overall program.
We first consider the result-producing branch.
The purpose of the yet-to-be-evolved computer program is to take the six
inputs and produce one output. Thus, the result-producing branch should be
a Program whose input consists of the six actual variables of the problem, LO,
w0, H0, Lr,wr, and H1, and whose output represents the value of the single
dependentvariable of theproblem. Thus, the terminalset,t*6,for the resultproducing branch is
tpb= {L0, w0, H0, L1, w1, H1}.
The function set of the result-producing branch will contain the four arithmetic operations of additiory subtraction, multiplication, and protected division % (section 4.2). Since automatically defined functions are being used, the
function set of the resultproducing branch also contains the automatically
defined function ADF0. Thus, the function set, f,pu for the result-producing
branch is
frpb= {ADF0, *, *, *, eol
with an argument map for this ftrnction set, f*6, of
{3,2,2,2,21.
The result-producing branch of each individual program in the population
is a composition of primitive functions from the function set, f,pa, arrdterminals from the terminal sel,,l-6.
82 Chapter 4
This problem illustrates a frequently useful way of constructing the terminal set and function set for the result-producing branch. Specifically, when
automatically defined functions are being used, the terminal set, 'T*u, of the
result-producing branch is the same as the terminal set, t, that would have
been used if automatically defined functions were not involved. The function
sel, f*6,of the result-producingbranch is the union of the available automatically defined functions (just ADFO here) and the function set, f, that would
have been used if automatically defined functions were not involved.
We now apply the first and second major steps to the function-defining
branch.
Since the function-defining branch defines a function in terms of three
dummy variables, the terminal set, to6, fot the function-defining branch is
'hdf - {ancO, ARG1, ARG2 }.
The function set, fo6, for the function-defining branch is
foaf= {*, -, *,Zl
with an argument map for set, foay, of
{2,2,2,2}.
The function-defining branch of each individual program in the population is a composition of primitive functions from the function set, foay, and
terminals from the terminalset, To4.
This problem also illustrates a frequently useful way of constructing the
terminal set and function set for the function-defining brandr. When automatically defined functions are being used, the terminal set, 'Ioay, of the function-defining branch consists of as many dummy variables as there are
arguments for the automatically defined function involved. Since the automatically defined function here has three arguments,'loaf, consists of aRG0,
ARGI-, and eRc2. The function set, foq, of the function-defining branch is the
sarne function set, f,that would have been used if automatically defined functions were not involved.
The third, fourth, and fifth major preparatory steps in solving the two-boxes
problem are the same with automatically defined functions as without them.
Thble 4.3 is called the tableau with ADFs for the problem and is the first of 1,6
such tableaux in this book. This tableau supplements the tableau without
ADFs (table 4.2) for this problemwith the information specifically applicable
to the use of automatically defined functions. The tableau withADFs relates
to the architecture of the overall program and the terminal set and function
set of each branch of the overall program (i.e., the sixth major preparatory
step).
The second row of the tableau with ADFs reflects the choice of the architecture of the overall programs for the problem. The architecture includes the
nurnber of function-defining branches and the number of arguments possessed bv each automaticallv defined function. If there were more than one
83 Introduction to Automatically Defined Functions - The TWo-Boxes Problem
Table 4.3 Tableau with ADFs for the two-boxes problem.
automatically defined function, the architectural information would also
specify whether the automatically defined functions may,or may not, make
reference to one another in a hierarchical way.
The third row identifies the values of any parameters relating to the use of
automatically defined functions. This row identifies the chosen way of
assigning types to the noninvariant points of an overall program (i.e., branch
typi"g or point typi"S as described in section 4.8).
The fourth and fifth rows specrfy the terminal set, ,l,pu, arrd function set,
frpn, for the result-producing branch.
The sixth and seventh rows specify the terminal set, ,Ta6, andfunction set,
foa7, for the first function-defining branch (defining ADFO).
There may be more than seven rows in a tableau with ADFs if there is more
than one automatically defined function. The tableau contains additional rows
in the event that there is more than one kind of automatically defined function (e.9., automatically defined functions with different terminal sets, different function sets, or different argument maps).
4.7 CREATION OF THE INITIAL RANDOM POPULATION
When automatically defined functions are being used, the initial random generation of the population must be created so that each individual overall program in the population has the intended constrained rymtactic structure (e.g.,
one function-defining branch and one result-producing branch for the twoboxes problem). Specifically, every individual program in generation 0 for
Objective: Find a program that produces the observed ,uul,,r" of th"
s_mgle dependent variable, D, as its output when given
the values of the six independent variables as input.
Architecfure of the
overall program
with ADFs:
One result-producing branch and one three-argument
function-defining branch defining a three-argument
automatically defined function ADF 0.
Parameters: Branch typing (described in section 4.8).
Terminal set for the
result-producing
branch:
The six actual variables of the problem, L0, W0, H0, Ll,
w1, and Hl.
Function set for the
result-producing
branch:
*, -, *, and %, andtheonethree-argument
automatically defined function ADF 0.
Terminal set for the
function-defining
branch ADFo:
The three dummyvariables ARGO, ARG1, and ARG2.
Function set for the
function-defining
branchADFo:
The four primitive functions of the problem, t ,
-
,
*
and %.
84 Chapter 4
(ARG0 ARG1 ARG2)
Figure 4.12 A randomly created program from generation 0 for the two-boxes problem with
ADFs.
the current example must have (or behave as if it has) the invariant structure
represented by the six points of types L through 6 shown above the dotted
line in figure 4.6. Each function and terminal in the function-defining branch
of the current example is of type T.Thefunction-definingbranch is a random
composition of functions from the function set, f"ay, artd terminals from the
terminal set,,To6. Each function and terminal in the result-producing branch
of the current example is of type 8. The result-producing branch is a random
composition of functions from the function set, f,pa, ffitd terminals from the
terminal set,Trpu.
Figure 4.L2 shows a randomly created program from generation 0 for the
two-boxes problem consisting of one function-definingbranch and one resultproducing branch. hr this program ADF O computes the sum of its three dummy
variables. The result-producing branch computes HJh * Wo + Ho).
4.8 STRUCTURE-PRESERVING CROSSOVER AND TYPING
In the crossover operatiory a crossover point is randomly chosen in each of
two parents and genetic material from one parent is then inserted into a part
of the other parent to create an offspring. \A/hen automatically defined functions are involved, eachprograminthe pcipulationconforms to a constrained
syntactic strucfure. Crossover must be performed in a strucfure-preserving
way so as to preserve the syntactic validrty of all offspring.
As already mentioned, every point in an overall program is assigned a
type. Some of the points in an overall program are invariant over the entire
population. For example, every program in the population with one function-defining branch and one result-producing branch as described above
must have the invariant structure represented by the six points of types L
through 6 shown above the dotted line in figures 4.6 and4.L2. More complex overall programs typically have more than six such invariant points.
hrtroduction to Automatically Defined Functions 85 - The Two-Boxes Problem
Structure-preserving crossover never alters the invariant points of an overall
Program/ so none of these invariant points are ever eligible to be crossover
points in strucfure-preserving crossover. Instead, strucfure-preserving crossover is restricted to the noninvariant points shown below the dotted line in
figures 4.6 and 4.I2. As previously mentioned, the invariant points of programs with automatically defined functions are not actually created, stored,
or manipulated in our computer implementation of this process.
Every noninvariant point in the overall program is also assigned a type.
The basic idea of structure-preserving crossover is that any noninvariant point
anywhere in the overall program is randomly chosen, without restriction, as
the crossover point of the first parent. Thery once the crossover point of the
first parent has been chosery the crossover point of the second parent is randomly chosen from among points of the same type. The typing of the
noninvariant points of an overall program constrains the set of subtrees that
can potentially replace the point and the subtree below it. This typirg is done
so that the strucfure-preserving crossover operation will always produce
valid offspring.
The following two ways of assigning types to thenoninvariantpoints of an
overall program are employed in this book:
' Branch typing assigns a different type uniformly to all the noninvariant points
of each separate branch of an overall program. There are as many types of
noninvariant points as there are branches in the overall program.
' Point Wping assigns a type to each individual noninvariant point in the overall program. The type assigned reflects the function set of the branch where
the point is located, the terminal set of the branch where the point is
located, the argument map of the function set of the branch where the point
is located, and any syntactic constraints applicable to the branch where the
point is located.
Branch typi.g is the default choice for the way of assigning types. It is used
on the two-boxes problem.
Branch Vping can be illustrated using the program shown in figure 4.I2.In
branch | rprng, a first type (Vpe 7) is assigned to all five points in the body of
the function-defining branch and a second Vpe (Vpe 8) is assigned to all six
points in the body of the result-producing branch. When structure-preserving crossover is performed, any noninvariant point anywhere in the overall
program (i.e., any of the 11 points of type 7 or 8) may be chosen, without
restriction, as the crossover point of the first parent. However, the crossover
point of the second parent must be chosen only from among points of this
same type. In the context of this example, if the crossover point of the first
parent is from the function-defining branch (type 7), the crossover point of
the secondparentis restricted to its function-definingbranch (its type Tpoints);
if the crossover point of the first parent is from the result-producing branch
(type 8), the crossover point of the second parent is restricted to its resultproducing branch (its type 8 points). Lr other words, structure-preserving
86 Chapter 4
crossoverwill either exdrange a subtree from a function-definingbranch only
with a subtree from another function-defining branch or it will exchange a
subtree from a result-producing branch only with a subtree from another
result-producing branch. The restriction on the choice of the crossover point
of the second parent ensures the syrtactic validity of the offspring.
Point typirg is used when the architecture of the overall Program is being
evolved during the run. It will be described in detail in chapter 21 where it is
first used and then illustrated in chaptercZlthrough 25. (Another approach,
like-branch typing, is discussed in section 15.4, but not used in this book).
There is a fundamental differencebetween a crossover occurring in a function-defining branch versus one occurring in the result-producing brandr.
Since the result-producingbranch usualiy contains multiple references to the
function-defining branch(es), a crossover occurritg it the function-defining
branch is usually leveraged in the sense that it simultaneously affects the
result-producing branch in several places. hr contrast, a crossover occurring
in the result-producing branch provides no such leverage.
4.9 RESUITS WITH ADFs
We now examine one actual run of genetic programming with automatically
defined functions for the two-boxes problem.
The run starts with the creation of a population of 4,000 random programs;
the program shown in figure 4.l2istypical of such Programs
As one would expect, the 4,000 randomly generated individuals found in
generation 0 are not very good. The fibress of the worst individual program
in the population for generation 0 has the enormous error (standardizedfltness) of 3.07 x 1038. This baffling program invokes its defined function ADFO
seven times in its result-producing branch and is shown below:
(progn (defun ADF0 (ARGO ARGI- ARG2)
(values (- (* (* (% ARG2 ARGO) (+ ARGO ARGI)) (+ (*
ARG1 ARGI) (- anCO ARGI) ) ) (* (* (* ARGO ARGO) (-
ARG2 ARGO) ) (Z (+ ARG2 ARG2) (- ancr ARG2) ) ) ) )
(values (ADF0 (Z (- (% W0 L0) (+ W0 L1)) (- (- W0 H0) ( *
w0 H0))) (* (ADF0 (* w]_ w0) (% w0 H1) (ADFO L1 Wl L0))
(ADFO (ADFO H0 w0 H0 ) (- H0 W1) (* n1 H0 ) ) ) ( * (- (ADFO
L0 H0 H1) (ADFO H1 L0 H0)) (z (* Hl Hl) (- wo Ll))))))).
The average fitness of the population as a whole for generation 0 is
3.54xI0o1.
The median individual in the population for generation 0 has a fitness of
1538.5 and is
(progn (defun ADF0 (ARGO ARG1 ARG2)
(values (+ (Z (* ARG2 ARGI) (+ ARGI- ARG2))
ARG2) (_ ARGO ARGO) ) ) )
(values (+ (- (+ W1 Hl) (ADFO H0 W0 W0)) (* ( *
w0w0)))))).
(z (% ARG2
w1 w0) ( -
Introduction to Automatically Defined Functions - The TWo-Boxes Problem
Using the fact that the protected division function % retums L for an attempt
to divide by 0, we see that the defined function ADFO is equivalent to
ArgIArg2
--rl r 1
Argl+ Arg2
Although ARGO appears twice in annO, it plays no role in the value retumed
byADF0because (-ARG0 ARGO)is0and (% <<x>> o)returns<<x>>for
all <<x>>.
Substituting this expression into the one occurrence of alpo in the resultproducing branctr, we see that the result-producing branch is equivalent to
w^w^ w?
w, + H, - =:3:-!- -l+wrw,(w' -wil = wr* Hr - {* -t
Wo+Wo r v' v v' ' ' 2Wo
As canbe seery this expressionbears little resemblance to the correct mathematical expression for the solution to this problem. Indeed, as is typical for
random individuals, this individual is partiatly blind and does not even use
three of the six independent variables that we know are needed to express a
solution to the problem.
The fourthbest individual from generation 0 has a fihess of 1,153 and is
(progn (defun adf0 (ARGO ARG1 ARG2)
(values (* (- ARG1 ARG2) (% ARG2 ARG2) ) )
(values (* (ADFO Hl W0 W1) (* L0 Wl))))).
The automatically defined function here ignores one of its dummy variables
(anCO). Since ADFO is equivalent to (- ARG1 ARG2 ), the result-producing
branch is equivalent to
4Wr(%-W1).
The best of generation 0 has a standardized fihress of 1,1,42and is
(progn (defun adfO (ARGO ARG1 ARG2)
(values (Z (* (% (- ARG2 ARGO) (Z ARG2 ARG2)) ARG0) (*
(* (- ARGI ARG0 ) (+ ARG2 ARGI- I I eac2 ) ) ) )
(values (- (* W1 Wl-) (* W1 (* (- uf H0) L1)))))).
This program has 16 terminals and 14 functions in the body of its functiondefining branch and its result-producing branch and has 30 points.
This best-of-generation individual for generation 0 does not invoke ADF0
in the result-producing branctr, so the result produced by the program is
equivalent to
w? -wJt(h- H,'l.
Although this best-of-generation program bears little resemblance to the
correctmathematical expression for solving theproblem, this program isbetter than the other 3,999 programs of generation 0.
The standardized fihress of the best-of-generation program progressively
improves to L,1"0L, 909,823,699,697, artd96between generations L and 6.
88 Chapter 4
br generation 6, the best-of-generation program has 34 points, invokes its
ADF0 twice, and is
(progn (defun ADF0 (ARGO ARGI ARG2)
(values (- (- ARGO ARGO) (* (* ARG0 ARGI) (% enCZ (Z
ARG2ARG2)))))
(values (- (+ (- (ADF0 L]- W1 H1) (ADF0 W0 H0 L0)) L0)
(+ (-HlH1) (- (u L0l0) Hl)))))).
The definition of ADFO for this individual is equivalent to
-ArgoArgtArgz
and is t}re negatiae aolume of a box of dimensions ARG0, ARG1, and anC2.
The result-producing branch for the best of generation 6 is equivalent to
WoHoQ-hWflt+4-l+Hr.
As it happens, the standardized fifiaess of the best of generation 7 is also 96;
howeve{, this individual scores the same value of fihress in a very different
way.This 34-point individual invokes its ADFO twice and is
(progn (defun ADFO (ARGO ARGI ARG2)
(values (- (- ARGI- ARGO) (* (* ARG0 ARGI) (% ARG2 (Z
ARG2ARG2)))))
(values (- (+ (- (ADFO L1 W1 H1) (ADFO W0 H0 L0)) L0)
(+ (- Hl Hl) (- (z L0 L0) Hl)))))))
The function definition for this anpO for this individual from generation 7
is equivalent to
Argr- Argo- ArgoArgt T4^'g'\ - Argt* Argo- ArgoArgrArgz
I Argz I
lM)
The result-producing branch for the best of generation 7 is equivalent to
w, - It- ry - Ho -wo -Y+ + Io -r+ H,. IrHruvh
hr contrast to ADFO from the best of generation 6 (which was an interpretable formula for negative volume), this atrO from the best of generation 7 is
a complex expression that has no obvious interyretation in the context of the
problem. Nonetheless, this ADFO, along with this result-producing branch, is
just as good as the easier-to-understand, equally-fit program from
generation 6.
The trajectory of programs produced in the successive generations of a run
of genetic programming (viewed, say, from the perspective provided by the
best-of-generation programs) is considerably different from the trajectory of
versions of programs that a human programmer would produce in the process of creating and debuggrng a program.
The best-of-generation programs from generations 6 and 7 arc both typical, in their own ways, of the intermediate results that are usually produced
89 Introduction to Automatically Defined Functions - The TWo-Boxes Problem
by genetic programming. Both of these two intermediate results are approximately correct. The first two of the five terms of the best of generation 6, in
fact, represent an algebraically correct solution to the problem, while the last
three terms are extraneous and erroneous. The program from generationT is
typical of the opaque programs often evolved by genetic programming.
Cenetically produced intermediate results from successive generations are
often closer and closer approximations to the 10O%-correct solution to the
problem. Th.y generally exhibit better and better fitness when measured in
terms of the fibress measure being used on the problem.
An improved program in a later generation of genetic programming is not
the consequence of the application of any logically sound or mathematically
valid rules of inference to any program from an earlier generation. Similarly,
a later program is not the result of an intellectual diagnosis of the deficiencies
or bugs in a previous program. None of the intermediate expressions
produced by genetic programming would probably ever appear in a debuggtng sequence created by a human progranuner. An intermediate version of a
program produced by a human progranuner is typically syrtactically close
to the immediately precedirg version (perhaps just a few keystrokes away).
Moreoveq, intermediate versions typically are syntactically close to the correct answer. Howevel, successive versions of a prograln produced by a human
programmer are typically very distant when measured in terms of a fitness
measure based on the sum of the absolute errors between the results produced by the program and the correct answer. For example, a typical intermediate version of a program in a debugging sequence produced by a human
programmer might have an erroneous plus sign (instead of a minus sign) or
an incorrect reference to some incorrect vaiable, such as Hl (instead of wt ) .
In generation 13, the best-of-generation individual has 24 points, has a standardized fihress of 0.0, and scores 10 (out of 10) hits. This 1O0%-correct individual is
(progn (defun ADF0 (ARG0 ARG1 ARG2)
(values (- (* ARG2 ARGO) (* (+ ARGO (* ARG0 ARGI)) (Z
ARG2 (U ARG2 ARG2) ) ) ) )
(vafues (- (ADFO L]_ W1 H1) (ADF0 W0 H0 L0))))).
The l"S-point ADFO in this program contains one subtraction, one additiory
three multiplications, and two divisions and eight terminals. It can be simplified to
-ArgoArgrArgz.
Lr other words, ADFO is a calculation of the negative volume of the box of
dimensions ARG 0, ARG 1, and eRc2 . The result-producing branch then invokes
this useful calculation twice and does a subtraction (in what a human prografiuner would consider to be "reverse" order) to produce a 100"/"-correct
solution to the problem.
An examination of enr'O for this best-of-run program from generation 13
shows that it is the same as ADFO of the best of generation 6.
90 Chapter 4
>>
i)
Xq)
-
A
ar
O
tr
.tra
qJ
-
li
I
a
Figure 4.13 100%-correct best-of-run program from generation L3 for the two-boxes problem
withADFs.
Best of Generation
{- Average
0 Generation 13
Figure 4.14 Structural conrplexity curves for the two-boxes problem with ADFs.
Figure 4.L3 shows the 1O0%-correctbest-of-run individual from generation L3 of this run with automatically defined functions as a rooted, pointlabeled tree with ordered branches. The function-defining branch is on
the left of this figure and the result-producing branch is on the right. The
six points of types 1 through 6 above the horizontal dotted line are invariant. The body of the function-defining branch appears beneath the values of the function-defining branch (point 5). A11 of the 15 points in the
body of the function-defining branch are of type T.Thebody of the resultproducing branch appears beneath the values of the result-producing
branch (point 6). All of the nine points of the body result-producing branch
are of type 8.
Structural complexity measures the size of a program. The structural complexity of a program is a count of the number of times that the functions from
the function set and the terminals from the terminal set appear in a program.
Lrtroduction to Automatically Defined Functions 91 - The TWo-Boxes Problem
Solve subproblem
Negative volume of
firstbox: -L0*W0*H0
(nes-vofume L0 W0 H0
L{], WO, HO
LI, W1, HL
Subproblem:
Compute negative volume
of box of dimensions
ARGO, ARG1, ARG2
Solution to subproblem
of computing negative
volume:
-ARGO *ARG1 *ARG2
Two-boxes
problem
Negative volume of
secondbox: -L1*W1*H
Decompose Solve original problem
Figure 4.15 Three*step top-down hierarchical approach applied to the two-boxes problem in
which there is a general mechanism for computing the negative volume of a box.
This count excludes invariant points (such as the points of types 1-6 in figures 4.6/ 4.12, and 4.13 that are above the dotted line). There are24 points in
the 1O0%-correct best-of-run pro$am from generation L3 shown in figure
4.13. The smaller the size of a program, the more parsimonious it is.
Figure 4.L4 shows the structural complexity curaes for this run of the twoboxes problem with automatically defined functions. The figure shows, by
generation, the structural complexity in the best-of-generation program and
the average of the values of structural complexity for all the programs in the
population.
The above L00%-correct solution evolved with automatically defined functions can be interpreted in the light of the three-step problem-solving process.
Genetic programming decomposed the overall problem into the simpler subproblem of computing negative volume. Genetic programming then solved
the problem of computing negative volume (in anrO). Finally, genetic programming solved the overall problem in the result-producing branch by
usingsubtractionto assemble the twonegativevolumes into the difference of
the two volumesFigure 4.15 shows the application of the three-step top-down hierarchical
process to this problem. The first step is labeled "decompose" and produces
the subproblem of finding the negative volume of a box of dimensions ARGO,
ARG1, and enc2. The second step is labeled "solve subproblem" arrdyields a
general mechanism for finding the negative volume (taking the negative of
the product of aRc0, ARG1, and aRc2). \A/hen this general mechanism is instantiated with the actual variables, L0, w0, and HO, of the first box (shown
with the labele d arrow),it produces the negative volume of the first box. Similarly, when it is instantiated with the actual variables,Ll,wl, and Hl, for the
second box, it produces the negative volume of the second box. The third step
is labeled "solve originalproblem" and solves the overallproblemby invoking the general mechanism for finding the negative volume twice and by
assembling (using subtraction) the negative volume of the second box and
the negative volume of the firstbox.
The above solution to the two-boxes problem illustrates three of the five
ways in which the hierarchical problem-solving approach can be beneficial:
hierarchical decompositiory parameterized reuse, and abstraction.
First, hierarchical decomposition is manifested by the fact that the overall
program for solving the problem consists of a subroutine for computing the
negative volume and a result-producing branch.
92 Chapter 4
Second, the two occasions on which the subroutine ADFO is invoked with
different instantiations of values for its three dummy variables illustrate
parameterized reuse of the solution to the subproblem of computing the
negative volume. The function ADF0 is a general way of determining negative volume and may be reused onany combination of values or expressions.
Ceneralization occurs whenever parameterized reuse occurs.
Third, each occasion when ADFO is invoked with three particular actual
variables of the problem by the result-producing branch" abstraction is occurring. The exclusion of information that is irrelevant to solving the subproblem currently under consideration is an important aspect of the process of
decomposing an overall problem into subproblems. During the time that the
subroutine ADF0 is computing the negative volume of the first box with the
dimensions of the first box using LO, W0, and HO, the three other acfual variables of the problem, L1, wl-, and Hl, are momentarily irrelevant. The subroutine identifies a subset of the information available from the overall problem
environment as being relevant to the solution of the subproblem and excludes
all the remaining available information. There are an infinite number of combinations of the three momentarily irrelevant dimensions, L:.,wL,and Hl-, of
the second box, but none of them is relevant to the computation of the negative volume of the first box. The problem of computing the negative volume
of the firstbox is a problemwithina three-dimensional subproblem subspace
of the overall six-dimensional space of the overall problem. The three-dimensional problem of computing the negative volume of the first box is abstracted
from the overall six-dimensional problem environment. When the time comes
to compute the negative volume of the second box, the three-dimensional
problem of computing the negative volume of the second box is similarly
abstracted from the overall problem environment.
As previously mentioned, the hierarchical three-step problem-solving process ctll:I also be described in a bottom-up way. First, one seeks to discover
useful regularities and pattems at the lowest level of the problem environment. Second, one changes the representation of the problem so that the problembecomes restated interms of theregularities of theproblem environment.
This change of representation creates a new problem. Third, one tries to discover a solution to the presumably-simpler new problem (Rendell and Seshu
t990; Ioerge4 Rendell, and Subramaniam 1993; Ragavan and Rendell 1993).
In this run of the two-boxes problem, the concept of negative volume is a
regularity in the problem environment. Once one recognizes negative volume as a useful regularityfor this problem, one changes the representation of
the problem so as to create a new problem. The bottom-up interpretation of
this run is that genetic programming discovered the regularity in the low
level representation of the problem (i.e., that a negative volume appeared
twice). Genetic programming then recoded (in anr o) the problem in terms of
the discovered regularity into a new problem at a higher level (namely, a
problem involving negative volumes). That is, the representation was changed
from a problem involving six linear dimensions into a problem involving
two negative volumetric quantities. Third, genetic programming solved (in
Introduction to Automatically Defined Functions 93 - The TWo-Boxes problem
Thble 4.4 A change of representation recodes the length, width, and height of each
box into its negative volume.
Fihress case Negative volume
of first box
Negative volume
of second box
D
-u
-630
-360
-135
-24
-9
+05
-18
-96
-80
-180
-42
-54
-120
-35
the result-producing branch) the problem when restated in the terms of the
new representation (i.e., it discovered a subtraction in reverse order).
Table 4.4is a10-by-4tab1e obtained from the original 10-by-8 table 4.1. The
10 combinations of the original six independent variables (La, Wo, Hs, L1, W1,
and Hr) have been recoded using ADFO so that the second column of this
table now contains the negative volume of the first box and the third column
contains the negative volume of the second box. Specifically, the three independent variables associated with the first box (Ls - 3, Wo = 4, and Ho = D
have been restated as the negative volume of the first box (i.e., -84). Similarly,
the change of representation has recoded the three independent variables associated with the second box (Lt = 2, Wt = 5, and Ht = 3) as -30' The fourth
column contains the value of the dependent variable, D, associated with each
row of the table. These values are unchanged from the original table 4.1-. For
example,54 is the value of the dependent variable, D, associated with the
values (-84 and -30) of the two new independent variables in the first row of
this table. The result of this change of representation is a new problem, namely
a problem of symbolic regression involving eight (not six) independent variables and one dependent variable. The original six independent variables of
the problem shown in Table 4.! are,for simplicity, not shown in Table 4.4. The
new problem still has 10 fitness cases.
In practice, it may be easier to solve a problem of symbolic regression involving eight independent variables than the original one with six independent variables. As it happens in this particular case, unbeknownst to genetic
prograrnming, this change of representation converts a six-dimensional nonlinear problem to a problem of conventional linear regression (a mere subtraction).
Of course, not all changes of representations are usefuI. A recoding based
on the products of the three factors is useful for the two-boxes problem, but a
recodingbased on the raising one number to a power equal to the product of
1
2
J
4
5
6
7
8
9
10
-30
-30
48
-24
42
54
600
312
111
-18
-171,
363
-36
-24
45
94 Chapter 4
two other numbers would not be beneficial in facilitating the solution of this
problem. Such a recoding would hopelessly encrypt the relatively simple
relationship existing among the variables of this problem. The problem would
be much more difficult to solve after such a change of representation.
Although we celn intelpret what genetic prograruning does as a real:u;ation of the three-step hierarchical problem-solving process, none of the three
steps (either in the top-down or bottom-up form) appear as steps of genetic
Prograrnming. There is no explicit decomposition of the original problem into
subproblems; there is no separate and explicit solution of subproblems; and
there is no explicit assembly of solutions to subproblems into a solution of the
overall problem. Similarly, there is no explicit search or discovery of regularities or pattems; there is no separate and explicit recoding or changlng of the
representation; and there is no separate and explicit solution of any new problem in the terms of any new higher level representation.
This example of the successful operation of genetic progranuning with
automatically defined functions is the first of numerous successful examples
in this book that provide evidence supporting main point L:
Main point L: Automatically defined functions enable genetic programming to solve a variety of problems in a way that can be interpreted as a
decomposition of a problem into subproblems, a solving of the subproblems,
and an assembly of the solutions to the subproblems into a solution to the
overall problem (or which can altematively be inteqpreted as a search for
regularities in the problem environment, a change of representation, and a
solving of a higher level problem).
Of course, it is unlikely that a human progranuner would ever write a subroutine for negative volume and a main program that performed the subtraction in what most human progranuners would regard as "reverse,, order.
Howeveq, it is important to remember that genetic progranuning does not
produce computer Programs in the style of a human progranuner. Genetic
Prograrnming is driven by its fitness measure, and not by the post hoc considerations that humans think of after they see the solution produced by genetic
programming. The calculation of negative volume and the reverse subtraction are every bit as fit as a calculation of positive volume and subtraction in
the more familiar order.
Since genetic Progranuning is a probabilistic process, the result (i.e., the
best-of-run program) produced by genetic programming is almost always
different from one run to the next (although each result maybe equally gooa
in terms of solving the problem).
The following additional four runs of this problem illustrate the above
points.
In a second run with automatically defined functions, the followin 9700%- correct program emerged in generation 13 of that run:
(progrn (defun ADFO (ARGO ARG1 ARG2)
(values (- (- (z ARG0 ARGO) (* ARGI_ ARGO))
(+ (Z (* ARG2 ARGI-) (+ ARGI_ ARG0))
(- (U ARG1 ARG2) (- ARGO ARGO) ) ) ) ) )
Introduction to Automatically Defined Functions 95 - The TWo-Boxes problem
(values (+ (" Wl (- (* W0 H0) (* L1 Hl)))
(ADFO (- w1 L0) (* H0 w0) (- wo w0)))))).
Here the defined function is equivalent to
ArSo
- ArsArso - !'8'Arg' - 4!E- t Argo - Argo
Argo Arg + Argo Argz
=r - ArgrArgo -
ArgzArg
-
Arg'
Argrt Argo Argz
If the reader thought that the computation of negative volume was odd, the
defined function found in this 1O0%-correct solution to the problem is positively buarre.
The result-producing branch then calls this defined function with argumentsof (- W]_ L0), (* H0 W0),and (- W0 W0).Sincethethirdargument
(- W0 w0 ) is equivalent to zero and division by zero (using the protected
division function ?) equals 1, the result-producingbranch is equivalent to
WJWoHo - LtHr) + 1- HoWo(Wr- 4) - t
=WtWoHo - HtWrI" +1-WtWoHo - H0W04 -I
- -HrWrh+1- HoWoh -I
= HoWob - HtWtIt'
Genetic progranrming decomposed the original problem (which can be
solved, as already shown, by a relatively simple difference between two products) into the unexpected subproblem of computing
r- ArsrArpo -
ArgzArgt
-
Argr
L t\' 6LtL' 6"
Argt+ Argo Argz'
It then assembled the solution to this subproblem into a100% correct solution
to the overall problem. Although this decomposition seems bizatte, it is just
as good as the somewhat more straightforward solution involving negative
volume, when measured in terms of the fitress measure goveming this problem (i.e., finding a good fit to the data in table 4.1).
hr viewing this 10O%-correct solution to the problem, one is reminded of
Jobtn Kendrew's reaction (1953) as the first human to see the three-dimensional structure of a Protein:
"Perhaps the mostremarkable features of themolecule are its complexity *d
lack of iymmetry'. The arrangement seems to be almost totally lacking in the
kind of regularities which one instinctively anticipates, and it is more complicated than has been predicted by *y theory of protein structure'"
This example, along with nurnerous examples that will appear later in this
book, provide evidence to support main point 2:
Main point 2: Automatical$ defined functions discover and exploit the
regularities, symmetries, homogeneities, similarities, patterns, and
Chapter 4
modularities of the problem environment in ways that are very different from
the style employed by human prografiuners.
In a third run with automatically defined functions, the following
lOO%-correct program emerged in generation l-4:
(progn (defun ADF0 (ARG0 ARGI- ARG2)
(values (+ (- (+ ARG1 ARG0) (+ ARG1 ARGO) )
(* ARGO ARG2) ) )
(values (- (ADF0 L0 L1 (* W0 H0))
(ADFO H1 Hl (ADFO Ll w0 W1)))))).
Although ADF 0's second (middle) argurnent ARG1, appears twice in AoF 0,
it plays no role in the value returned by ADF0. Instead, ADF0 is equivalent to
the positive area of the rectangle whose sides are ARGO and anc2, namely
ArgoArgz.
One invocation of enr'O is equivalent to the product (* r,O ( * w0 H0 ) ) .
Another invocation of anpo produces the product (* ut (ADFO L1 wo
w1) ), which is equivalent to (* Ht L1 w1) . The 10O%-correct solution is
obtained in the result-producing branch by using the calculation for area in
the following way:
4(WoHo)- HrUtW) = InWoHo - HrhWr.
It would be unlikely to occur to a human programmer to solve this problem in this indirectway. Indeed, this approach is especially unlikely since the
calculation of the area of a rectangle is merely ordinary two-argument multiplicatiory which is already available as one of the primitive functions in the
result-producing branch. Human programmers donot usually write subroutines that duplicate the functionality of already-available primitive operations.
Howeve4, genetic progranuning sometimes uses automatically defined functions to recreate already-available primitive functions.
In a fourth run, the three-argument automatically defined function of the
lO0%-correct program that emerged in generatton22merely returns its second argument, ARG1, to the result-producing branch (i.e., aDrO is a projection). The result-producing branch then uses this aDFO to yield one of the
terminals, w0, already available in the terminal set of the problem.
(progn (defun ADF0 (ARGO ARG1 ARG2)
(values ARG1)
(values (+ (- wO (ADFS (% H1 H1) wO (* L1 Hi_)))
(- (" L0 (* W0 H0)) (* W1 (* Ll H1))))))).
Although human progr€unmers do not usually call a subroutine from a main
program for the sole pulpose of returning a variable that is already available
in the main program, this solution is just as fit as its predecessor.
In a fifth run, the 1"00%-correct program shown below emerged in generation 23. br this program the result-producing branch ignores the elaborate
function defined in its function-definingbranch and simply solves the problem at hand without using its automatically defined function.
97 Introduction to Automatically Defined Functions - The TWo-Boxes Problem
(progn (defun ADFO (ARGO ARG1 ARG2)
(values (* (- ARG1 ARG2) (+ (% ARG2 ARGO) (* (+ (% (*
ARG1 ARGI) (+ (+ ARG2 ARGI) ARGI)) (- (- (Z ARG2
ARG2) (% (* ARG2 ARGO) (+ ARG2 ARGI) ) ) (* ARG2
ARGI))) (+ (- ARG1 ARG0) (% (+ ARG2 ARGI-) ARGI))))))
(values (+ (+ (* (- (% HO (% L1 L0)) W1) (* H1 L1))
(- L0 L0)) (* (- w0 Hl) (* L0 H0)))))).
The availability of an automatically defined function imposes no obligation
on the result?roducingbranch to callit.
Other genetica\ evolved programs solved this problem using negative
area and some even solved this problem by using the available subroutine to
define ordinary positive volume.
In summary, the automatically defined function ADFO created by genetic
programming can be volume, negative volume, area, negative area, or an
entirely obscure function that has no simple explanation in terms of the problem domain. We did not predetermine which of these concepts would be
used for the purpose of decomposing and solving the problem. Similarly, we
did not predetermine whether the automatically defined function would be
referenced once, twice, many times, or not at all. The functionality of the
automatical$ defined function as well as the role, if any, assigned to itby the
result-producing branch are both subject to the evolutionary process. Neither
the function-defining branch nor the result-producing branch is necessarily
elegant, orderly, parsimonious, predictable, or suscePtible to any simple
interpretation.
4.IO COMPARISON OF THE STRUCTURAL COMPTEXITY OF THE
SOLUTIONS
In the foregoing sections, we looked at only selected illustrative runs of genetic
programming with and without automatically defined functions. In the next
two sections, we sfudy a series of runs in order to obtain statistics for compar'
ing results with and without automatically defined functions. We first consider the structural complexity (size) of the best-of-run programs from the
successful runs within a series of runs.
We made a series of 33 runs of the two-boxes problem without automatically defined functions and 93 runs with them.
A run is considered to be a successful run if at least one Program in the
population satisfies the success predicate of the problem by generation G. If a
best-of-run program satisfying the success predicate is 100%-correct, we call
it a solution; otherwise, we refer to a best-of-run program satis$ring the success predicate as a satisfactory result.
Nine of the 33 runs (27%) without automatically defined functions and 15
of the 93 runs (16%) with automatically defined functions were successful
runs by generation 50.
The aauage structural complexity, S, of a specified set of programs is the
average of the values of structural complexlty for each Program in the set.
98 Chapter 4
When the specified set is the population as a whole, S is the average of the
values of structural complexrty for all the programs in the population. However, S most frequently appears in this book for the small set of best-of-run
programs that actually satis{z the success predicate of the problem over a
series of runs. Specifically, Swithout is the average structural complexity of the
best-of-run programs that satisfy the success predicate of the problem from a
series of a runs without automatically defined functions. Switn is similarly
defined for runs with automatically defined functions. Note that S.itnout drtd
Switn are each based on at most one program from each run.
The average structural comple xrtf, S wrthorr , of the best-of-run programs from
the nine successftil runs without automatically defined functions is 17.8 points.
The average strucfural complexitf, S-itn, of the best-of-run programs from
the L5 successful runs with automatically defined functions is 33.5 points.
The structural cornplexity ratio, R5, of a problem is the ratio of Sr;y7ouy to
Swith. For this problem,
o _ Structural complexity without ADFs Swithour Ij.8 A E. '*r - =-i; =
33i
=u.rr.
Since the structural complexity ratio, R5, is less than L for the two-boxes
problem, the best-of-run programs from the successful runs with automatically defined functions are bigger (less parsimonious) than the best-of-run
Programs from the successful runs without automatically defined function.
That is, for this particular problem, automatically defined functions are not
advantageous in terms of average structural complexity.
4.!I COMPARISON OF COMPUTNTIONAL EFFORT
This section describes one way of measuring the computational effort, E,
required to yield a solution (or satisfactory result) to a problem with a satisfactorily high probability.
We obtain E empirically from a series of runs. Each run is made using a
particular fixed population size, M, and a particular fixed maximum number
of generations, G.
The number of fibress evaluations that must be executed to yield a solution
(or satisfactoryresult) to a problem is a reasonable measure of computational
burden in an adaptive algorithm. Every adaptive algorithm starts with one
or more points in the search space of the problem and then iteratively performs the following two steps: measuring the fitress of the current point(s)
and using the information about fitness to create new point(s) in the search
sPace. Fibress evaluations are cofiunon to all adaptive algorithms (probabilistic or deterministic alike), including genetic algorithms, hillclimbing, neural nets, simulated annealing, genetic classifier systems, and genetic
Programming. This is the case regardless of what algorithm-specific or problem-specific name may be used for fibress (e.g., payoff, goodness, benefit,
score, profit, cost, utility, and error). Fitness evaluations consume a significant fraction (often an overwhelming fraction) of the computer resources
Introduction to Automatically Defined Functions 99 - The TWo-Boxes problem
100
required for nontrivial problems. Even if the creating of new points is also
computationally intensive for a particular algorithm, there is one fihress evaluation associated with each such step of creatior! so the number of fitress evaluations is still a reasonable common measune.
If every run of genetic programming were successful in yielding a solution
(or satisfactory result), the number of fitness evaluations required to yield a
solution (or satisfactory result) would be easy to measure. If success occurs
on the same generation of every run, the number of fibress evaluations would
merely be the product of the population size, M, and the number of generations that are run (assuming that exactly M operations are executed on
each generation). If success occurs on different generations in different mns
(but is guaranteed to occur), the number of fitness evaluations would be the
product of the population size, M , and the average number of generations
that are run.
Since genetic programming is a probabilistic algorithm, not all runs are
successful at yielding a solution to the problem by generation G.
When a particular run of genetic programming is not successful after
running the prespecified maximum number of generations, G, there is no
way to know whether or when the run would ever be successful. When a
successful outcome cannot be guaranteed for every run, there is no knowable value for the number of generations that will yield a solution (or satisfactory result) and the simple calculation described above cannotbe used.
Consequently, a probabilistic calculation is required in order to compute
the number of fitness evaluations required to yield a solution (or satisfactory result) to a problem.
We can empirically observe the probability, Y(M,i), that a run yields,
for the first time, at least one program in the population satisfying the
success predicate of the problem on generation i.
Once we have obtained this estimate of the instantaneous probability of
success Y(M,i) for each generation i, we can comPute an estimate of the
cumulatiae probability of success, P(M ,i), that a particular run with a population
size, M,pelds a solution by generution i. The cumulative probability, P(M,i) ,
is, of course, a monotonically increasing function of the generation i. If every
run in the series yields a program satisfying the success predicate by generation G, P(M,G), will be 1.0. hr practice, P(M,G) will often fall short of L.00.
The probability of satisfying the success predicate by generation i at least
once inR independent runs is 1 - lt - 4Uj)]R. If we want to satisfy the success predicate with a certain specified probability z, then it must be that
z=t-[t- r1u,i1]R.
Throughout this book, z - 99"/". As can be seen, the number of independent
runs, R, required to satisfy the success predicate by generation i with a satisfactorily high probability of z = 99o/o, depends on both z and P(M,i). After
taking logarithms, we find
r roe(l_z)
R= R(M,i,z) = l,
| loetr_p(na,ill I
Chapter 4
Q zoo
\
F<
o
L
E
fl roo
0
a
F.
tV.
E
CH
L
€)
Eo
z 0.0 0.2 0.4 0.6 0.8 1.0
Probabiliff P(M,i)
Figure 4.15 Number of independent runs required, R(M,i, z), asa function of the cumulative probability of success P(M,i ) for z =99"/o.
.s 10 G
\
q
-
o)
h
.-
-
O
tl. \
F
o
I
-
4.
Fl
eFt
L
q)
tr0
J
z 0.4 0.s 0.6 0.7 0.8 0.9 1.0
Probability P(M,i)
Figure4.17 Theportionofthe R(M,i,Z) curveforwhich P(M,i) >0.4showsthegranular
nature of the function.
Introduction to Automatically Defined Functions - The Two-Boxes problem
rF
q)
o
a
q)
I
!
A
-
q)
-
+)
0
-
/-
Fl-
-
.-
.-
-
a
-
FI
4,000,000
2,000,000
-'. I
NU
a
a
0)
I
I
I
-
(a
C51
h
*r
. E
.-
A
-
/t
-
ti
A
FLI
Without Defined Functions
(50,27Vo)
(4,3Vo)
Generation
Figure 4.18 Performance curves for the two-boxes problem showing that it is sufficient to
process 1 ,17 6 ,000 trrdividuals to yield a satisfactory result for this problem wi rh99% ptobability
and that Ewithout =1,176,000 withoutADFs.
where the brackets indicate the ceiling function for rounding uP to the next
highest integer.
Figure 4.16 shows a $aph of the number of independent runs/ R(M,i,z),
required to yield at least one successful run with probabilif z = 99"/" as a
ftrnction of the cumulative probability of success, P(M,i). The higher the
probability of success, the fewer independent runs are required to yield at
least one successftil run. For example, if the cumulative probability of success, P(M,l), is a mere 0.09, then 48 independent runs are required to yield a
successful run with a99% probability.
The R(M, i, e) function has a step-like nature that is caused by the ceiling
function.
Figure 4.17 spotlights the step-like nature of the R(M ,i, e) ftrnctionby showing only the boxed portion of figure 4.1.6 where P(M,,) > 0.4. If P(M,i) is
0.68, only four independent runs are required; if P(M,i) is 0.78, then only
three runs are required; and if P(M,i) is 0.90, only two runs are required. Of
course, { P (M, i) Is 0.99,only one run is required. Because the values of P (M, i)
are estimates obtained from empirically observed values ol Y(M,i), the value
R(M,f,e) is also an estimate.
Figure 4.L8 presents two related curyes, called the performance curres, for
the two-boxes problem without automatically defined functions. The curves
are based on 33 independent runs, each with a population size, M, of 4,000
and a maximum number of generations to be tnn, G,of 5L (i.e., generation 0
through generation 50). A total of 79 performance curves will appear in this
book, so we now explain in detail the standard form of these curves.
The rising curve in figure 4.L8 shows, by generatiory the experimentally
observed cumulative probabitity of success, P(M,i), of solving the problem
E = 1.176.000
1.02 Chapter 4
by generation i. This curve shows that the cumulative probabilif of success
is 0.0% for generation 0 over these 33 runs. That is, this blind random search
of 132,000 points (4,000 x 33 runs) of program space did not unearth a satisfactory result for the problem. The rising curve also shows that the cumulative probability of success stays at0.0% for generations L through 3, and then
rises to3% for generation 4,9o/o for generations 5 through 10,I2"/rfor generation 11, \5o/' for generation12, L8% for generations L3 through 18,24h for
generations L9 through 24, artd reaches 27o/o for generations 25 through 50.
The second curve (made with lozenges) in figure 4.18 shows, by generation, the total number of indiaiduals thnt must be processed, I(M,i, z) ,
tn order
to yield a solution (or satisfactory result) for the problem with z -99o/"probability for a population size, M,by generation i. Specifically,
I(M,i,z)= M(i+ l)R(e).
Thecornputationalffirt,E,required toyield a solution (orsatisfactoryresult)
for the problem with a stated probabilif z is the minimal value of I(M,i, a) ,
over all the generations i between 0 and G. The first generation at which
I(M,i,z), attatns this minimum value is called thebest generation, i*. Thus,
the computational effort is
E = I(M,i* ,z) - M(i. + l)R(e).
Note that this retrospective and empirical method of measuring computational effort, E, depends on the particular choices of values for M and G and
all the other quantitative and qualitative parameters that control a run of
genetic programming. The value of E thus obtained is not necessarily the
minimum computational effort possible effortfor the problem.
The second curve in figure 4.1"8 shows, for example, that I(M,i, z) is undefined for generations 0 through 3 because the observed probability of success,
P(M,i ), is zero for these early generations and hence the requirednumber of
runs/ R(M ,i, z) ,
is infinite. Of course, if we did an extremely large number of
independent runs, we would find that the probability of success at generation 0 is, in fact, some small nonzero value. This nonzero probability is the
probabilif of solving the problem by means of blind random search in program space. I(M,0,2) could then be computed from this probability and
would, of course, be colossal for any nontrivial problem.
For generation 4 the probabilif of success, P(M,i ), has the nonzero value
of 3%. Consequently, R(M,i, e) is 150. If this problem is run through to generation 4 and then abandoned (i.e., a total of 5 generations from generation 0
through to generation 4), processing a total of 3,000,000 individuals (i.e., 4,000
x 5 generations x 150 runs) is sufficient to yield a solution (or satisfactory
result) for this problem with 99%probability. That is, I(M,4,e) is 3,000,000.
For generation 5 where the probability of succes ts 9"/o, R(M ,i, a) is 49. rf
this problem is run through to generation 5 and then abandoned, processing
a total of 1,176,000 individuals (i.e., 4,000 x 6 generations x 49 runs) is sufficient to yield a solution (or satisfactory result) for this problem with 99%
probability. In other words, I(M,5, e) is I,176,000.
103 Introduction to Automatically Defined Functions - The Two-Boxes problem
Eq)
a
a
q)
c,
L
A
-
O
*a
(t)
-
U
.Fl
.-
st.
I
*a
0
(a
q)
I
I
F.
0
fH
>>
+a
a -
FI
-
tr
A .
Fl
With Defined Functiong
Generation
Figure 4.19 Performance curves for the two-boxes problem showing that E*iry = 2,220,000
withADFs.
For generations 6 through 10 the observed probabilif of success/ P(M,i),
remains constant at9% for this particular series of observations. As a result,
R(M,i, e) remains constant at49 for generations 6 through 10. Therefore, more
than I(M,5,2)= I,176,000 individuals must be processed to yield a solution
(or satisfactory result) for this problem with 99o/oprobability for generations
6,7,8,9, and 10 because the product of 4,N0 and49 is being multiplied by the
progressively larger values 7 , 8,9,10, and 11, respectively. For generation 6,
I(M,6,2) is1,372,000 (i.e.,4,000 x7 generations x 49 runs). By generation 1-0,
I(M,l0,z) reaches a value of 2,156,000 (i.e.,4,000 x1t generations x 49 runs).
When the observed P(M,l), and consequent$ R(M,i,1),remains constant
over several generations, the plot of I(M,i,e) resembles the rising edge of a
sawtooth for those generations. As a result, the value of I(M ,5, z) = t,\76,000
attained at generation 5 continues to maintain its position as the current
globalminimum for I(M,i,z) rp to generation 10. The sawtoothis, of coutse,
an artifact of the empirically observed values of Y(M,i).
For generation LL the observed probability of success, P(M,i), rises 12"/o
and R(M, i, a) drops to 36. If this problem is run through to generation LL and
then abandoned, processing a total of '1,,728,000 individuals (i.e., 4,000 x 12
generations x 36 runs) is sufficient to yield a solution (or satisfactory result)
for this problem with 99% probability. This value of I(M,1l,z) of L,728,000
for generation 11 is less than the value of I(M,L},z) of 2,156,A00 for generation L0, so we see a falling edge to the sawtooth that started to rise at generation 5. Howeveq, 1,728,00A ismore than I,176,000, so the value of I,176,000 for
I(M,i,e) attained at generation 5 maintains its position as the global minimurn for I(M,i,z) uP to generation lL.
As canbe seen in figure 4.I8,the value of 1.,176,000 for I(M,i, e) attained at
generation 5 continues to maintain its position as the global minimum for
I(M,i,z) up to generation 50. Because each successive tooth of the sawtooth
Chapter 4
"t04
@-
, l- P,NII I
i | +- I(tut' i' r) |
(14,72Vo)
starts at a higher point, generation 5 is almost certainly the global minimum
for I(M,i,z) for all i. Consequently, the best generatiorl i*, is 5 and the computationaleffortwithoutautomaticallydefinedfunctions, EwithoutrisI,LT6,0W
for this problem. The numbers 5 and !,L76,000 are placed in the oval in the
figure to indicate this fact. A thin gray vertical line is used in the figure to
highlight the fact that generation 5 is the best generation. The valu e of R1M , i, z)
for the best generation i* is called R(e).
This minimal value of I(M,i, z) of '1,,r76,0a0 is a measure of the computational effort without automatically defined functions, Err,oa4yr n€c€ssary to
yield a solution (or satisfactory result) for this problem with 99%probability.
If we were required to solve this problem, were committed to a population
size, M, of 4,000, and knew in advance that the global minimum of I(M,i,z)
occurs at generation 5, then the least expected number of fihress evaluations
would be expended in solving this problem by making a series of independent runs and abandoning each such run at generation 5.
Three points on each rising cumulative probability curve are highlighted
in figure 4.18. First, the value of cumulative probability is noted at the first
generation for which the cumulative probability first becomes nonzero. For
this figure, this occnrs at generation 4, where the probabitity is 3%. Second,
the cumulative probability is noted at generation i* (the generation number
highlighted in the oval). The cumulative probability is9%for generation 5 for
figure 4.18. Thfud, the value of the cumulative probability is noted at the final
generation (generation 50). The cumulative probability is 27% for generation
50 for this figure.
The rectangular legend in figure 4.18 containing four items recites the fact
that the population size, M, is 4,000, that the probability z is 99o/o, that the
number of runs, R(z), required to yield a solution (or satisfactory result) for
this problem at 99"/" probability is 49, for the best generation l* and that the
curves in this figure are based on N = 33 independent runs.
Figure 4.19 shows the performance curves for the two-boxes problem with
automatically defined functions. The population size, M,of 4,000 is the same
as for the previous figure and e =99o/".This tigure is based on N = 93 runs. The
cumulative probabili$ of success is 1,6% at generation 50 (thus yielding 15
satisfactory results out of the 93 runs). The cumulative probabiliy ofsuccess
rs12"h at generation 14, so R(e) = 37. The numbers 14 and2,220,000 in the
oval indicate that, if this problem is run through to generation 14, processing
a total of E.u, =2,220,000 individuals (i.e., 4,000x 15 generations x 37runs) is
sufficient to yield a satisfactory result for this problem with 99% probability.
The ffiiency ratio, Ru,isthe ratio of the computational effort, Ewithout,without automatically defined functions to the computational effort, E*ith, with
automatically defined functions. For this problem,
Rn= Effort without ADFs Ewithout
Effort with ADFs E*itt
Since the efficiency ratio is less than L for this problem, the runs using automatically defined functions require more computational effort than the runs
Introduction to Automatically Defined Functions - The TWo-Boxes problem
1,176,000
= 0.53. 2.220.000
105
Table 4.5 Comparison table for the two-boxes problem'
WithoutAutomatically
defined functions
With Automatically
defined functions
Average structural
complexity S
Computational effort E
17.8
r,176,000
33.5
2,220,000
3,000,000
2,000,000
E
1,000,000
0
WithoutADFs Without ADFs
Figure 4.20 Summary graphs for the two-boxes problem'
without automatically defined functions. That is, automatically defined functions are not advantageous in terms of the computational effort required to
solve this particular Problem.
The advantages and disadvantages of the method described above for
measuring computational effort, E, artd the characteristics of an altemative
measure tased on wallclock time are discussed in sections 8.16, 9.14,
and L0.2.
4.12 SUMMARY
Table 4.5 compares the average strucfural complexities, Swithout and jri,l, and
the computational efforts, Ewithout artd. E*u., for the two-boxes problem with
and without automatically defined functions. This table is called the comparison tablefor the problem. Atotal of 27 comparison tables similar to this table
will appear throughout this book.
The inJormation in the comparison table can be summarized in terms of
the structural complexity ratio, Rs, and the efficiency ratio, RB,fot the problem. As it happens, both ratios are 0.53 for the two-boxes problem'
Figure 4.2-0 shows this information and these two ratios as a pair of bar
grupht. A total of 26 summary graphs similar to this figure wilt appear throughout this book.
As can be seen from the summary graphs, automatically defined functions
yielded neither a more parsimonious result nor a reduction in the computational effort required for the two-boxes problem'
One might not expect that it would be possible to simultaneously breed
both a function definition and a calling program dynamically during a
run in order to solve a problem. However, genetic programming with
With ADFS With ADFs
106 Chapter 4
automatically defined functions is capable of solving the two-boxes problem. Nonetheless, it is disappointing that the two comparative ratios of
average structural complexity and computational effort indicate that there
is no advantage to automatically defined functions for this particular problem. This disappointing conclusion apparently results from the fact that
the underlying regularity in the two-boxes problem consists of only two
invocations of a common calculation consisting of only two multiplications. The benefits of exploiting the regularity inherent in this problem
environment do not outweigh the overhead associated with automatically
defined functions. The tide will turn in the next chapter.
107 Introduction to Automatically Defined Functions - The TWo-Boxes problem
Problems that Straddle the Breakeven Point
for Computational Effort
The previous chapter explained the technique of automatically defined
functions and illustrated it with the two-boxes problem. This simple problem provided the opportunity to define a useful function (whether it be
volume, area, negative volume, or negative area) dynamically during the
run and then use it in solving the problem. UnfortunateIy, when we compared genetic programming with and without automatically defined functions, we were disappointed to find that genetic programming with
automatically defined functions did not exhibit any advantage in the twoboxes problem in terms of the average size of the evolved solutions (or
satisfactory results) or the number of fitness evaluations required to yield
a solution (or satisfactory result) with 99"/" probability. This chapter will
reach the opposite conclusion as to the number of fitness evaluations and
demonstrate that automatically defined functions can reduce the computational effort required to solve a problem.
br this chapter genetic programming will be used to solve both a simpler
and a scaled-up version of four problems, both with and without automatically defined functions. For each of these L6 combinations, multiple runs will
be made. Sixteen performance curves will be created. \A[hen these 'l,6pefiormance curves are analyzed as to the number of fibress evaluations required
to yield a solution (or satisfactory result) to the problem with 99%probability,
automatically defined functions will prove to be non-beneficial for the simpler versions of the four problems, but beneficial for the scaled-up versions.
In other words, each of the four problems straddles an apparent breakmen
pointfor computational ffirt. The simpler versions of the four problems do not
have enough regol*ity, symmetry, homogenei V, ^dmodularity in their problem environments to overcome the overhead apparently associated with automatically defined functions. Like the two-boxes problem, they aretoo simple
to make automatically defined functions beneficial. In contrast, the scaled-up
versions of the four problems are sufficiently difficult to benefit from automatically defined functions.
The scaling is done in four domains: the order of a polyromial, the arity
(i.e., number of argurnents) of a Boolean function, the number of harmonics
of a sinusoidal functiory and the frequency of reuse of the constant n in a
multi-term algebraic expression.
Problem 1: The simpler version of the first problem is a symbolic regression (system identification) of a quintic polynomial xs -2x3 + r involving
one independent vaiable, x;the scaled-up version is a symbolic regression of
the sextic potynomial x6 -Zxa + *2.
Problem 2: The simpler version is a symbolic regression of the
Boolean S-symmetry function; the scaled-rp version involves the
6-symmetry function.
Problem 3: The simpler version is a symbolic regression of
sinx + sin2x + sin3x. One additional harmonic, sin 4x, is added to this
expression so that the target for the scaled-up version of this problem becomes
sin x + sin 2x + sin 3x + sin 4x.
Problem 4: The simpler version is a symbolic regression of the twoterm expression x I n + *2 / nz inwhich the constant value n is used three
times. One additional term, 2nx, is added to this expression so that the
target for the scaled-up version of this problem becomes the three-term
expression x I fi + *2 / n2 +2trx.The modularity it this problem environment is that n is used three or four times in the two different target expressions. This problem canbe efficiently solved by finding the constantvalue
n (or some constant value related to n) that can be repeatedly invoked in
the solution.
The breakeven point straddled by the four problems in this chapter is
the breakeaen point for computational effort.There appears to be another
breakeven point for parsimony (inverse average structural complexity)
and yet another for wallclock time. The scaled-up versions of two of the
four problems in this chapter appear to be on the beneficial side of the
breakeven point for average structural complexity. Starting with the next
chapter, almost all of the problems in this book will be on the beneficial
side of the breakeven point for average structural complexity.
s.l SEXTIC VERSUS QUTNTTC POTYNOMIAL
This section considers the problem of symbolic regression of the quintic
(order 5) polynomial
*5 -2*3 + x = x(x *t)2 (x +t)2
and the scaled-up version of this problem involving the sextic
(order 6) polynomial
*6 -2*a + x2 = xz (x -t)'(* +r)' .
These two target functions for symbolic regression are scaled in terms of
the order of the polynomials. As can be seen from the above factofization of
these two pollmomials, they also differ in that the squaring function is
invoked twice for the quintic polynomial and three times for the sextic. Th.y
also differ in that there are only two repeated roots for the quintic polynomial, but three for the sextic. Consequently, the sextic polynomial has a
Chapter 5
0
x
Figure 5.1 Sextic polynomial x6 - 2)c4 + x2 rnthe interval [*1.0,+1.0].
slightly greater amount of potentially exploitable regularity and modularity than the quintic.
We consider the scaled-up version of each of the four problems first.
5.L.1 Sextic Polynomial x6 -zxa + x2
Figure 5.1 graphs the values of the sextic polynomial x6 -Zxa + x2 in the
interval [-1, +1].
5.1.L.L Preparatory Steps without ADFs
Genetic programming is capable of solving this problem of symbolic
regression.
The preparatory steps for this problem are straightforward and similar to
the twoSoxes problem.
There is only one independent variable, x, h this problem. In addition,
certain ephemeral random constants are included in the terminal set of this
problem. During the creation of the individual programs in the initial random population (generation 0), wheneve r fJlie ephemeral random constant, X, ts
chosen as the terminal to be located at a point of a program, a constant from a
designated range is independently, separately, and randomly generated and
inserted into the program at that point. The six different kinds of random
constants used in this book (9ireals, frbiggur-r"ab, 9treal-vector, Svg, frBoolean,
and 9i1s6ary) are defined when they are first encountered and listed ir uppendixA. For this problem, random floating-point constants, frreals, are used.
Whenever afloating-point random constanf,9treals, i chosen as the terminal to
be located at a point of a program, a floating-point number between -1.000
and +1.000 is independently, separately, and randomly generated and inserted
into the program at that point. Floating-point random constants are generated with a granularity of 0.001 in the sense that each of the 2,00L floatingpoint constants between -1.000 and +1.000 is equally likely to be generated.
111 Problems that Straddle the Break-even Point for Computational Effort
Table 5.1 Thbleau withoutADFs for the sextic polynomial x6 - Zxa + xz .
Objective: Find a program thatproduges qe given value of the
sextic polynomial xo - 2x" + x' as its output when
glven the value of the one independentvariable, r,
as input.
Terminal set
withoutADFs:
X and the floating-point random constants, fr*ur,.
Function set
without ADFs:
*, -, * and ?.
Fihress cases: 50 random values of x from the interval [-1.0,+1.0].
Raw fibress: The sum, over the 50 fihress cases, of the absolute
value of the error between the value retumed by the
program and the given value of the dependent
variable.
Standardized fibress: Same as raw fibress.
Hits: The number of fihress cases (between 0 and 50) for
which the raw fibress is less than 0.01 (the hits
criterion).
Wrapper: None.
Parameters: M=4,000. G=51.
Success predicate: Aprogram scores the maximum number of hits
(i.e.,50).
Table 5.1 summarizes the key features of the problem of symbolic regression for *6 -2*o +.r2 without automaticallv defined functions.
5.L.L.2 Results without ADFs
We first consider the problem of the sextic polynomial x6 -Zxa + x2 using
genetic programming without automatically defined functions.
Occasionally genetic progamming produces an algebraically correct solution to a problem. For example, in one run, the following 100%-correct solution emerged in generation 5:
(*(-x(*(*xx) x) ) (-x(*(*xx) x) )).
More typically, genetic pro$amming produces a good approximation to
the target curve. The following best-of-run individual satisfying the success
predicate of this problem (i.e., scoring 50 hits out of 50) from generation 37 is
an example of such an approximation:
(% (Z (* (* X 0 .57L) (* (- (* (+ (U 0.634094 0.68469) (+ (+ X X )
-0.s992)) (* (* (+ (% 0.634094 0.68469) (+ x -0.5992)) (* ( 3
0.3s4904 -0.7549) (* x 0.571))) (- x 0.39s493))) -0.4665)
0.1s0497)) (+ (? 0.02L1,945 x) (+ X (% 0.02II945 x) ))) (+ ( -
0.1,12394 0.036392) (- (* (% -0.116905 x) (+ (? -0.1-1690s X) ( *
(* -0.1549 0.141205) (% (% 0.354904 -0.7549) (- (* -0.5297 X) ( *
Chapter 5
Eq)
0
0
q)
I
L
tr
q)
A
-
*.)
a
-
6g
I
-
E. F
.!l
-' I
-
-
1
A
o
a
q)
9
9
I
-
0
cFi
:
o -
-
.rl
,-
ct
th
A
H
Without Defined Functions
25
Generation
6.000.000
3,000,000
(50,42Vo)
(5,5.3Vo)
Figure-S.2 P.erformance curves for the symbolic regression of the sextic polyno-
-iut x6 - Zxa + -r2 showing that Ewithout = l,440,000without ADFs.
(+ (A 0.354904 -0.75491 1* (+ (% 0.634094 0.68469) (+ x
-0.s992)) (* (% 0.354904 -0.7549) (* x 0.395493)))) (* 0.6823 ( *
-0.5297 x) ))))))) (* (+ (% 0.634094 0.68469) (+ X -0.s992)) ( *
(* (+ (Z 0.634094 0.68469) (+ x -0.5992)) (* (? 0.354904
-0.7549) (- x 0.3e5493))) (- x 0.3es493)))))).
The average stmctural complexity, Swithout, ofbest-of-run programs from
the eight successful runs (out of the L9 runs made) for the problem of symbolicregression for x6 -2xa + x2 is Tg.8pointswithoutautomaticallydefined
functions.
Figure 5.2 presents the performance curves based on these 19 runs of
the problem of symbolic regression for *6 *2*o + x2 without automatically defined functions. The cumulative probability of success, P(M,i), is
42%by generation 39 and is still 42%by generation 50 (thus yielding eight
satisfactory results from the L9 runs). The two numbers in the oval indicate
that if this problem is run through to generationSg, processing a total of
Ewitho,t = "1.,440,000 individuals (i.e., 4,000 x 40 generations x 9 runs) is
sufficient to yield a satisfactory result for this problem with 99% probability.
5.1.1.3 Preparatory Steps with ADFs
Wenow consider theproblem of the symbolic regression of the sextic polynomial 16 -Zxa + x2 using automatically defined functions.
The simplest architecture for an overall program employing automatically
defined functions consists of one result-producing branch and one functiondefining brandr, so we adopt this architecture for this problem. Since there is
only one independent variable, a associated with this problem, it is appropriate that the automatically defined function take one argument.
39 E = 1.440.000
113 Problems that Straddle the Break-even Point for Computational Effort
1,L4
Thble 5.2 Thbleau withADFs for the sextic polynomial x6 -Zxa + x2
Objective: Find a program thatproduges 4e glven value of the
sextic polynomial xo - 2x" + x' asits output when
given the value of the one independent variable, x,
as input.
Architecture of the
overall program
with ADFs:
One result-producing branch and one one-argument
function-defining branch.
Parameters: Branch typing.
Terminal set for the
result-producing
branch:
X and the floating-point random constants, S,""r,.
Function set for the
result-producing
branch:
*, -, * and % and the one-argurnent defined function
ADFO.
Terminal set for the
function-defining
branch ADFO:
The dummyvariable ARGO and the floating-point
random constants, fr*u*.
Function set for the
function-defining
branch ADFO:
r, -, * and %.
This problem can appropriately employ the same straightforward approadr
to choosing the terminal set and function set for the twobranches as the twoboxes problem. First, the terminal set, Trpb, of the result-producing branch is
the same as the terminal set, q of the problem when automatically defined
functions were not being used (i.e., the actual variable of the problem, x, plus
the random constants). Second, the function set, frpb, of the result-producing
branch is the union of the available automatically defined functions (just ADF 0
here) and the function set, f,that was used when automatically defined functions were not being used. Third, the terminal set, 'To6, of the function-defining branch consists of as many dummy variables as the chosen arity of the
automatically defined function (plus any random constants). Since the automatically defined function has just one argument here, To4y, consists of just
ARGO (plus the random constants). Fourth, the function set, fadf, of the function-defining branch is the sarne as the function set, f, of the problem when
automatically defined functions were not being used.
Table 5.2 summarizes the key features of the problem of symbolic regression for x6 -2x4 + x2 with automatically defined functions.
5.1|1,.4 Results with ADFs
A human programmer writing a program whose output is to be the value
of the r.*ii. polynomtal x6 -2xa +x2 might notice that x6 -2xa +*2 is
equal to
x2 (x -t)'(* +l)',
Chapter 5
115
Figure 5.3 l0O%-correct best-of-run program from generation 10 for the qmbolic regression
of the sextic polynomial xo - 2x* + xz with ADFs.
and that the square is taken on three occasions in this expression. The prograruner might then write the following:
I ;;;- definition of the one-argument function "square"-
2 (progn (defun square (argO)
3 (values (* argO arg0 ) ) )
4 ;;;- main program for computing the value of sextic
5 ;;; polynomial6 (values (* (square x) (square 1- x 1) )
7 (square (+ x 1)))).
As previously mentioned, occasionally genetic programming produces an
algebraically correct solution to a problem. For ex€unple, in one run, an algebraically correct solution to this problem emerged in generation 10:
(progn (defun ADF0 (ARG0 )
(values (* (- (+ ARG0 ARG0) ARG0)
(_ (Z ARGO ARGO) (* ARGO ARGO) ) ) ) )
(values (* (ADF0 X) (+ (ADFO X) (- X X) )))).
This genetically evolved solution is not equivalent to the seven-line program that the human programmer might have written; howeveq, it exploits
the same regularity in the problem envirorunent in a somewhat different way.
This particular genetical$ evolved solution reverses the presumed roles of
the function-defining branch and the result-producing branch. The genetically evolved ADFO is equivalent to x - x' .The result-producing branch calls
on ADFO twice with the same argument, x, artd then multiplies the results
together to produce *6 -2xa + xz .
Figure 5.3 shows this 100%-correct best-of-run individual with automaticallydefined functions fromgeneration L0 as arooted, point-labeled treewith
Problems that Straddle the Break-even Point for Computational Effort
ordered branches. The function-defining branch is on the left of this figure
and the result-producingbranch is on the right.
This solution is a hierarchical decomposition of the problem. First, genetic
programming discovered a decomposition of the overall problem into a subproblem of finding the algebraic square root of the target sextic polynomial
(i.e., x - r3). Thery genetic programming solved the subproblem. Third, genetic programming assembled the results of solving the subproblem into a
solution to the overall problem by multiplyi^g together the results of two
calls to the defined function ADFO.
A second example illustrates the fact that genetic programming usually
produces a good approximation to the target curve, rather than an algebraically correct solution. The following good approximation to the target sextic
curve scoring 50 hits (out of 50) emerged in generation 19 of one run:
i/nrnnn lrlofrrn :rlfO /:rrrO\
\y!vYrr \uv!urr \s!:,v /
(values (Z (* (* -0.3842 ARG0) (- (% ARGO ARG0) ARGO))
(Z (- (+ 0.60989 -0.1008) (z ARG0 ARGO)) (- (- ARG0
-0.0236053) (* (* -0.3842 ARGO) (+ (Z (* (* -0.3842
ARGO) (% 0.50209 (+ ('( (- ARG0 ARGO) (* 0.L32095
ARGO)) (? -0.495803 -0.183501)))) (z (- (+ 0.3185
-0.2695) (* (* -0.3842 ARGO) (- (% ARGO ARGO) ARG0)))
(+ 0.60989 -0.1008) ) ) ARG0) ) ) ) ) )
(values (ADFO (- (? x X) (* X x) )))).
Here ADFO simplifies (if we may use that term) to
-0.7 8264 Arg03 + 0.7 6417 Arg02 + 0. 0 1 847 43 Arg0
+0. 30069 Ar g03 (r - 4, rq0'97-?996
+ 0'3842 A'r so(t - tr *0) ' 0.1091 +0.3842Ars0(1 - Arg0)
The result-producingbranch invokes ADFO with an argument of l- x2.
Athfud example shows thatwhen automatically defined functions are used,
calls to defined functions often contain arguments that themselves consist of
calls to defined functions. The following individual scoring 50 hits emerged
in generation L6 of one run:
(progln (defun ADF0 (ARG0 )
(values (* (+ (* ARGO 0.6694) ARGO) (* (+ (+ (* (- ARGO
ARG0) (- (% (+ -0 .8254 ARG0) ARGO) (+ ARG0 (% (% ARG0
ARGO) (+ ARGO 0.45529))))) ARG0) (% (% ARG0 ARG0) ( -
-0.1206 ARG0))) (- ARGO 0.61L69) ))))
(values (ADFO (* (+ X ('( (* X 0.617294) (anpO (ADFO (* (*
x 0.6L1294) (* 0.7631 x) ))))) (* x 0.6L1294))))).
The average structural complexity, S.irn, of the best-of-run programs from
seven successful runs (out of 13 runs) of the problem of symbolic regression
for x6 -Zxa + x2 is8L.L points with automatically defined functions.
Figure 5.4 presents the performance curves based on these L3 runs of the
problem of symbolic regression for *6 - 2*a + x2 with automatically defined
functions. The cumulative probability of success, P ( M, i), is 54%bY generation
116 Chapter 5
With Defined Functions
-a
v
q)
a
(n
q)
I
lr A ,
E
C)
-.
*)
a
G!a
.-EI
-
100
a
a
o
I
I
E
-
0
Eso
*l.-
-
.-
A
-
lr
A .
-
0
6.000.000
(50,54Vo)
3.000.000
0 (10,7.7Vo) 25 50
Generation
Figure-S.4 Performance curves for the symbolic regression of the sextic polyno-
*iut 16 -2xa + -r2 showing that Ewith = 1,176,000with ADFs.
Thble 5.3 Comparison table for the sextic polynomial x6 -zxa + xz
WithoutAutomatically
defined functions
With Automatically
defined functions
Average strucfural
complexity S
Computational effort E
79.8
1.,M0,000
81.1
1,r76,000
1,17
48 and is still Sa%by generation 50 (thus yielding seven satisfactory results
from the 13 runs). The two numbers in the oval indicate that if this problem is
run through to generation 48, processing a total of Ewith ='1,,176,000 individuals (i.e., 4,000 x 49 generations x 6 runs) is sufficient to yield a satisfactory
result for this problem with 99'hprobability.
5.L.L.5 Comparison with and without ADFs
Thble 5.3 compares the average strucfural complexit!, Sritnour and S*rtn, and
the computational effort, Ewithout and Ewith, for the problem of symbolic regression for x6 - 2 xa + r 2 with automatically defined functions and without
them. As can be seery the computational effort, Ewith, required with automatically defined functions is less than the computational effort, E*ithout, without them. That is, automatically defined functions are beneficial for the
scaled-up version of this problem. On the other hand, the average structural
complexify is slightly less favorable with automatically defined functions than
without them.
Figure 5.5 summarizes the information in this comparison table and shows
a structural complexity ratio, Rs, of 0.98 and an efficiency ratio, Rr, o11.22.
Problems that Straddle the Break-even Point for Computational Effort
s
s
2,000,000
E
1,000,000
0
WithoutADFs WithADFs
Figure 5.5 Summary graphs for the
xo -2x* + xt.
WithoutADFs WithADFs
symbolic regression of the sextic polynomial
Figure 5.6 Quintic pollmomial xs -2x3 * r in the interval [-1,+L].
The fact that the efficiency ratio is greater than 1 indicates that automatically
defined functions are beneficial for the scaled-up version of this problem.
5.1..2 Quintic Polynomial x5 -2x3 + x
When we perform symbolic regression on a similaq, but simpler polynomial,
we find that automatically defined functions are notbeneficial as to the nurnber of fifiress evaluations.
Figure 5.6 graphs the values of the quintic polynomial xs -Zx3 + x in the
interval [-L, +11.
5.1..2.1 Preparatory Steps without ADFs
The tableau for the quintic version of this problem is identical to the tableau
for the sextic version (except that the target function is the quintic polynomial
xs *2*3 + x) and will not be shown here.
The raw fitness of an individual program in the population is the sum,
over the 50 values of the independent variable, xi, of the absolute value of
the error between the value returned by the program and the target value,
yi, of the dependent variable. The only difference between a run of the
symbolic regression problem for the sextic pol;momial versus a run for
the quintic polynomial is the fitress measure, and the only difference in the
118 Chapter 5
Fitness measure A
;1r,
-(',t -z*?
119
t {
x -zx + x
Figure 5.7 Strucfure arises from fitness.
fitness measure lies in the fibress cases consisting of pairs of values (xt y).
All other aspects of the fllns of genetic programming for the two problems
are the serne. Irr particulaq, the creation of the initial random generation of the
population is identical. Each program in generation 0 is composed of the independent variable, X, random constants, *, -, *, and protected division ?.
\zVhen the problem involves the sextic polynomial, the fibress measure uses a
sampling of values from the sextic curve and the result produced by genetic
programming is a qrmbolic expression that equals (or at least approximately
equals) *6 *2*o + *2. When the problem involves the quintic polynomiaf
the fibress measure uses a sampling of values from the quintic curve and the
result is a symbolic expression that mimics x' - 2x' + x. Thus, it is the fitress
measure that determines the programmatic structure that is produced by
genetic programming.
Figure 5.7 shows that, starting with the same initial population at generation 0, genetic programming gives rise to two different evolved solutions when it operates with two different fitness measures. The two
different structures emerge from the same starting population as a
consequence of the fitness measure.
5.1..2.2 Results without ADFs
In one run, the following algebraically correct solution emerged in
generation 1-5:
(* (* (- (zxx) (*xx) ) x) (- (%xx) (*xx) )).
\Atrhen automatically defined functions are not being used, the average
structural complexlty, Switrrout, of the best-of-run programs from the 2L successful runs (out of 24 runs) of the problem of spnbolic regression for the
quintic polynomial xs -2x3 + r runs is 69.0 points.
Figure 5.8 presents the performance curves based on these 24 runs of the
problem of symbolic regressionof the quinticpolynomial *5 - 2*3 + .r without
Problems that Straddle the Break-even Point for Computational Effort
Generation 0:
Population of programs
composed of
X,+,-r*,Vo,Tl
6^42
x -zx + x
--. 1
. G
(n
a
q)
c,
I
!a
a
Crr
.-€
-
-
cl
-
L
A .
-
Without Defined Functions
(50,87.57o)
2,000,000
(7'4vo) Generation
Figure-5.8 Performance curves for the symbolic regression of the quintic polynomial x) -2xt * x showing that E*ithout = 396,000 without ADFs.
automatically defined functions. The cumulative probability of success/
P(M,i) ,rs79"/"by generation 32 and is 87 .S%by generation 50. The two numbers in the oval indicate that if this problem is run through to generation 32,
processing a total of E.,,oou, -396,000 individuals (i.e.,4,000 x 33 generations
x 3 runs) is sufficient to yield a satisfactory result for this problem with99%
probability.
5.1,.2.3 Results with ADFs
We omit illustrative solutions for this simpler version of the problem and
proceed directly to the comparative statistics.
When automatically defined functions are being used, the average structural compleftf, Srrtn, of the best-of-run programs from the 33 successful
runs (out of 61 runs) of the problem of symbolic regression for the quintic
polynomial x5 - 2x3 + x is 64.0 points.
Figure 5.9 presents the performance curves based on these 6L runs of
theproblem of symbolic regression of the quintic polynomial xs -2x3 + x
with automatically defined functions. The cumulative probability of success, P(M,i), is 54%by generation 49 and is 54"h by generation 50. The
two numbers in the oval indicate that if this problem is run through to
generation4g,processing a total of E,u, = 1,200,000 individuals (i.e.,4,000
x 50 generations x 6 runs) is sufficient to yield a satisfactory result for this
problem with 99% probability.
5.1..2.4 Comparison with and without ADFs
Table 5.4 compares the average structural complexTtf, S*itnour and S*n , and
the computational effort, Ewithout and, E*u7, for the symbolic regression of
*5 - 2*t + x with automatically defined functions and without them. As can
4,000,000Eq)
0
(A
o
I
l.r
A
H
q)
A
-
i.(n
-
cg-
)
-
. I
. l
E?a
I
-
120 Chapter 5
^ 1
NV
a
(n
q)
I
I
-
-
0
€H
+a
-
.-
-.
-
L
A . H
With Defined Functions
(50,54Va\
t/
4.000.000
(5,l.6%o) 25
Generation
Figure-S.9 Performance curves for the symbolic regression of the quintic polynomial x) - Zxt * x showing that Ewith = 1,200,000 with ADFs.
Thble 5.4 Comparison table for the quintic polynomial xs -2x3 + x.
WithoutADFs WithADFs
8.000.000-
q)
a
to)
I O
I
L
A
li'l()
-
*)
0
-
-
ra
-
.-l
.!l
-
v
s
I
tAverage strucfural
complexity S
Computational effort E
69.0
396,000
u.0
1,200,000
S
q
Without ADFs With ADFs WithoutADFs WithADFs
Figure 5.10 Summary graphs for the quintic polynomial *5 - 2*3 + *.
Problems that Straddle the Break-even Point for Computational Effort
be seen, the situation for the quintic polynomial x5 -2x3 + x is the opposite
to the sextic polynomial. The computational effort, E*nh, required with automatically defined functions is greater than the computational effort, Ewithout r
without them. In other words, automatically defined functions are not beneficial for the simpler version of this problem.
Figure 5. L0 summarizes the information in this comparison table and shows
a structural complexity ratio, R5, of 1.07 and an efficiency ratio, R6,of
0.33. The fact that the efficiency ratio is much less than 1 indicates that
automatically defined functions are not beneficial for the simpler version
of this problem.
It appears that the simpler quintic polynomial is on non-beneficial side of
an apparent breakeven point for computational effort and that the higherorder sextic polyromial in on the beneficial side of this breakeven point. The
sextic polynomial is not, of course, a particularly challenging target function
for floating-point symbolic regression using genetic programming.The comparative simplicity of sextic polynomials in relation to the vast space of polynomials that might be a target for symbolic regression suggests that
automatically defined functions may enhance the performance of symbolic
regression for all but the simplest target polynomials.
5.2 THE BOOLEAN 6.SYMMETRY VERSUS s.SYMMETRY
The Boolean symmetry function is often used as a benchmark in the fields of
neural networks and machine leaming. Boolean functions are attractive for
experiments in genetic progamming for several reasons. First, it is often possible to understand how the structure of a program contributes to the overall
performance of the program for a Boolean ftrnction. Second, there are few
practical obstacles (e.g., overflows, underflows) to computer implementation
of evolved Boolean programs. Thfud, the easily quantifiable search space facilitates analysis of results. Fourth, no time-consuming simulations are required to measure fibress. Fifth, the number of possible fitness cases is finite
and small enough, for many problems, to permit testing of 100% of the fibress
cases. Sixth, Booleanproblems are amenable to the several optimizations (described in appendix E) that enable problems to be run with a relatively modest amount of computer time
5.2.1 The Boolean 5-Symmehy Problem
The Boole an syrnTnetry function of kBoolean arguments returns T (true) if its
Boolean arguments are symmetric, and otherwise returns Nrl (false).Symmetry is determined by verifying that the first argument matches the last
argument; the second argumentmatches the second-to-last argumenU and so
forth. If the number of arguments is odd, no comparison is made of the middle
argument.
For example, if the six arguments are 1,0,1., L,0, and L (where we use 1 to
d.enote r and 0 to denote Nri,), then the 6-qzmmetry function retums T. On
Chapter 5
d5
d4
d3
d2
d1
d0
Output
Figure 5.11 Boolean 6-symmetry function with inputs of 1.,1.,0,0,1, and L and an output of 1.
the other hand, the 6-symmetry function returns NrL if its arguments are 0, 0,
1,1,0, and L. The S-symmetry function returns T if its five arguments are 0, 0,
L,0, and 0.
Figure 5.LL shows that the output of the 6-symmetry function is L for inputs of \,'1,,0,0, L, and L.
The symmetry function is suitable for our purposes here because the
pairwise matching of the inputs imparts a certain amount of regularity and
modularity to this problem environment. The S-symmetry and 6-symmetry
functions are scaled in terms of their number of arguments. The 6-symmetry
function has a slightly greater amount of potentially exploitable regularity
and modularity than the S-symmetry functionbecause only two matches are
performed in computing the S-symmetry function whereas three matches are
performed for the 6-symmetry function.
5.2.1.1, Preparatory Steps without ADFs
In applying genetic programming to the Boolean 6-symmetry functiory the
terminal set, fi consists of the six Boolean arguments, so that
t- {D0,DL,D2, D3, D4, D5}.
The following function set consisting of four primitive Boolean functions
satisfies the sufficiency requirement (because it is computationally complete)
and satisfies the closure requirement:
F- {AND, OR, NAND, NOR}
with an argument map of
12,2,2,2\.
In additiory this function set is convenient in the sense that it produces programs that are relatively easy to trnderstand.
The set of possible fitness cases for this problem consists of the 26 - 64
combinations of the six Boolean arguments.
The raw fibress of a program is the number of fihness cases for which the
program refums the correct value. Raw fitness ranges between 0 and 64and
a larger value is better.
The standardized fitress of a program is the sum, over the 64fitness cases,
of the Hamming distance (error) between the value returned by the program
and the correct value of the Boolean function. Standardized fitress ranges
L23 Problems that Straddle the Break-even Point for Computational Effort
Thble 5.5 Thbleau withoutADFs for Boolean 6-symmetry problem.
Objective: Find a program that produces the value of the
Boolean 6-symmetry function as its output when
given the value of the six independent Boolean
variables as input.
Terminal set
without ADFs:
Function set
withoutADFs:
AND,OR,NAND,and NOn.
Fibress cases: A1126 = 64 combinations of the six Boolean arguments
D0, Dl, D2, D3, D4, andD5.
Raw fitness: The number of fitness cases for which the value
returned by the program equals the correct value of
the 6-svmmetrv function.
Standardized fihress: The standardized fihress of a program is the sum,
over the 26 = 64 fitness cases, of the Hamming
distance (error) between the value retumed by the
program and the correct value of the 6-symmetry
function.
Hits: Same as raw fitness.
Wrapper: None.
Parameters: M=16,000. G=51.
Success predicate: Aprogram scores the maximum number of hits.
between 0 and 64 and a value closer to 0 is better. Raw fitness is 64 minus
standardized fitness.
We chose a population size, M, of 16,000 based on our view of the likely
dfficulty of this problem (ut d to match the population size used for Boolean
problems throughout chapter 6).
Table 5.5 summarizes the key features of the problem of symbolic
regression of the Boolean 6-symmetry function without automatically
defined functions.
5.2.1.2 Results without ADFs
hr one run of the Boolean 6-symmetry problem without automatically defined functions, the following lO0%-correct best-of-run 145-point individual
emerged in generation 34:
(NOR (NOR (AND (NOR D2 D3 ) (NAND (NOR (AND D5 (NAND (AND D2 D3 )
(NOR (NAND (NOR (AND D3 D3 ) (AND D3 D3 ) ) (AND (NAND D0 D0 ) (NOn
D4 D1))) (OR D0 D5)))) (maun D0 D0)) D0)) (NOR (AND (OR (NOR (OR
(NAND D4 D4) (NaNo (AND D1 D0) (NaNo D1 D0))) (ANo (OR D5 D2)
D4)) (NAND D3 D2)) (NAND (NOR D2 D5) (NAND D2 DO))) (OR (OR
(NAND (AND D2 D3 ) D2) (NOR (AND D5 (NAND (NAND (NAND (AND D5
t24 Chapter 5
Without Defined Functions
^ 100
a
v) c)
I
I
i
-
0
bso
+)
.-
-
c!
A
-
t{
A .
-l
0
60,000,0006
q)
a
0
€)
I
(50.53.57o) I
r
'fu
-
/E
30,000,0009
o
-
6g
-
.!l€
-
FI
-
25 (r2'2To) Generation
Figure 5.12 Performance curves for the Boolean 6-symmetry problem showing that
E.ithour = 4,368,000 without ADFs.
(NAND D0 D2)) (On (NAND (AND D4 D1) (NAND D5 D2)) D1_)) (On (AND
Dl D4) (WON D4 D1-))) (NAND D3 D5))) (NAND DO DO))) (NOR (AND D3
D3) (AND D3 D3))))) (NAND (NAND (AND D5 (NAND DO DO)) (OR (NAND
(NOR (OR D4 D5) (NOR D2 Dl)) (eNo (AND (AND D2 D4) D0) D0)) D1))
(on (AND Dl D4) (NOR D4 D1)))).
The average structural complexitf, Swithoul, of the l00%-correct solutions
from the 23 successful runs (out of 43 runs) of the Boolean 6-symmetry problem is 143.0 points without automatically defined functions.
Figure 5.12 presents the performance curves based on these 43 runs for
the problem of symbolic regression of the Boolean 6-symmetry function
without automatically defined functions. The cumulative probability of
success, P(M,i), is a9%by generation 38 and is54%by generation 50. The
two numbers in the oval indicate that if this problem is run through to
generation 38, processing a total of E*uoout = 4,368,000 individuals (i.e.,
16,000 x 39 generations x 7 runs) is sufficient to yield a solution to this
problem with 99% probability.
5.2.1".3 Preparatory Steps with ADFs
We now consider the Boolean 6-symmetry problem using automatically defined functions.
A human programmer writirg u program for the 6-symmetry function
might conceivably employ the fact that any Boolean function can be written in disjunctive normal form and write a disjunction of clauses, each
consisting of the conjunction of the six Boolean arguments or their negations, for each of the 32 combinations of the arguments that returns a value
of t. However, a human programmer would almost certainly not code
this problem in this tedious way. Instead, the programmer would prob725 Problems that straddle the Break-even Point for computational Effort
126
Decompose Solve subproblem Solve original problem
Instantiate Assemble
Figure 5.13 Three-step top-down hierarchical approach applied to the 6-symmetry problem.
ably write a subroutine EQV capable of testing for the equivalence (i.e., the
even-2-parity) of two Boolean arguments and then call this two-argument
subroutine with three different instantiations of the two dummy variables
(formal parameters). The programmer might write something like the following ten-line overall program:
I ;; ;-def inition of t.he two-argument equivalence function
2 ;;; EQV (even-2-parity)-
3 (progn (detun EQV (argO argl)
4 (values (OR (AND arg0 argl)
5 (NoR argO arsl ) ) ) )
6 ;; ;-main progratn for Boolean 6-symmetry of
7 ;;; d0, d1, d2, d3, d4, and d5-
B (values (AND (AND (EQV d0 d5)
9 (EQV d1 d4))
r0 (EQV d2 d3))))
Lines 3 through 5 constitute the function-defining branch of this overall
program. This code implements the EQV function by retuming r if dummy
variables, ARG0 and eRc1, are either both r or both NrL.
Lines 8 through 10 constitute a main program that calls the two-argument
EQV function three times: first testing the equivalence of DO and D5, then
testing D1 and D4, and finally testing o2 and n3.
Figure 5.13 diagrams the way the above ten-line program applies the hierarchical three-step problem-solving process in its top-down form to the
6-qrmmetry problem. The original overall problem is at the left. In the step
labeled "decompose" near the top left of the figure, the original problem is
decomposed into one subproblem for determining the equivalence of two
Boolean arguments. In the step labeled "solve subproblem" in the top middle
of the figure, the subproblem is solved. Finatly, in the step labeled "solve
original problem" near the top right, the solution of the subproblem is instantiated with three different pairs of Boolean arguments and these three results
are assembled using the AND function into a solution to the overall problem.
The ten-line program above can also be interpreted in terms of the bottomup way of describingthe hierarchical three-step problem-solvingprocess. First,
one seeks to discover useful regularities at the lowest level of the problem
environment. In this problem, the useful regularities are the equivalence of
the first and last arguments, the equivalence of the second and second-to-last
arguments, and the equivalence of the middle two arguments. Second, one
Chapter 5
(EQV D0 D5)
Solution to
6-Symmetry problem
(AND (EQV DO D5)
(EQV D1 D4)
(EQV D2 D3 ))
Subproblem:
Equivalence EQV
two variables
Solution to subproblem:
(EQV ARGO ARGI) (EQV D2 D3)
(EQV D1 D4)
Thble 5.5 The new problem created by a change of representation using the new
independent variables n0, Rl, and R2 for the 6-s)rmmetry problem.
Fibress case R0 R1 R2 6-symmetry
0
1
2
J
4
5
6
7
Figure 5.14 Three-step bottom-up hierarchical approach applied to the 6-symmetry problem.
changes the representation of the problem so that the problem becomes restated in terms of the regularities. In the recoding, each of the three designated pairs of the original independent variables is replaced by one new
independent variable. Specifically,l0 and D5 are replaced by the single bit
n0 indicating whether D0 and D5 are equivalen! D1 and D4 are replaced by
the single bit nr indicating whether they are equivalenf and lz and t3 are
replaced by the single bit R2 indicating whether they are equivalent. The
6-symmetry problem has six independent variables (and 26 - 64fibress cases).
If we focus only on the new variables, table 5.6 shows the new problem
created by u change of representation that recodes the three designated
pairs of the original independent variables into the new independent variables R0, R1, and R2 for the 6-symmetry problem. There are eight different combinations of the new independent variables, Ro, Rl-, and R2. There
is one value of the dependent variable associated with each. The problem
still has 64 fitness cases.
After the change of representatiory the problem is much simpler even though
the fullproblem now actually has nine independent variables (three of which
are related to the original six). It is solved when the result-producing branch
uses the simple conjunction (attn) of the three new independent variables,
R0, R1, and R2.
Figure 5.14 shows the application of the three-step bottom-up hierarchical
approach applied to the 6-symmetry problem. The first step is to "identify
NTL
NIL
NIL
NIL
T
T
T
T
NTL
NIL
T
T
NTL
NIL
T
T
NTL NIL
T NIL
NTL NIL
T NIL
NIL NIL
T NIL
NTL NIL
T T
Recode D0 and D5 as R0
D0, Dl_, D2, D3,
D4, and D5
Solution to new
problem as
conjunction
(AND RO R1 R2
Recode Dl ard D4 as Rl
Recode D2 and D3 as R2
127 Problems that straddle the Break-even Point for computational Effort
Thble 5.7 Thbleau withADFs for Boolean 6-symmetry problem.
Objective: Find a program that produces the value of the
Boolean 5-symmetry function as its output when
given the value of the six independent Boolean
variables as input.
Architecture of the
overall program
with ADFs:
One result-producing branch and one two-argument
function-defining branch defining ADF 0.
Parameters: Branch $ping.
Terminal set for the
result-producing
branch:
D0, D1, D2,D3,O4, and O5.
Function set for the
result-producing
branch:
ADFO, AND O& NAND, and UOn.
Terminal set for the
function-defining
branch ADFO:
The two dummvvariables ARGO and anct.
Function set for the
function-defining
branch ADFO:
AND, OR, NAND andnOR.
regularities." The three recoding rules are the regularities. The second step is
to "change representation." This step changes the representation of the orignal problem stated in terms of the six independent variables DO, D1, D2,D3,
o4, and D5 into a new problem stated in terms of the three new independent
variables, R0, Rl, and RZ. The third step is to "solve" the problem now that it
has been restated in terms of the new representation. The problem is solved
by taking the conjunction (AND RO R1 R2 ) .
This problem can appropriately employ the same straightforward
approach to choosing the terminal set and function set for the two branches
as the sextic-pollmomial problem and the two-boxes problem. First, the
terminal set, Tro6, of the result-producing branch is the same as the terminal set, 'I, of tt:re problem when automatically defined functions were not
being used (i.e., the actual variables of the problem, D0, DL, D2, D3,D4,
and ns). Second, the function set, frpb, of the result-producing branch is
the union of the available automatically defined functions (just ADFO here)
and the function set, f, when automatically defined functions were not
being used. Third, the terminal set, ,To4y, of the function-defining branch
consists of as many dummy variables as the chosen arity of the automatically defined function involved. Since the automatically defined function has two arguments here, To4y, consists of ARGO and aRc1. Fourth, the
function set, fogy, of the function-defining branch is the same as the terminal set, f, of the problem when automatically defined functions were not
being used.
128 Chapter 5
Table 5.7 summarizes the key features of the problem of symbolic regression of the Boolean 6-symmetry function using automatically defined
functions.
5.2.1.4 Results with ADFs
In one run, the following 10O%-correct 78-point program emerged on
generation L6:
(progn (defr:n ADF0 (ARGO ARGI)
(values (IIAND (NOR (NOR (AI\ID ARGO ARGI) (On ARGO ARGI) )
(NOR (A}TD ARGO ARGO) (NOR ARGO ARGO))) (OR (NOR (AND
ARG1 ARGI) (MON ARGO ARGO)) (NOR (NOR ARG1 ARGO) (OR
ARGO ARGO))))))
(values (NOR (ADFO (nNn (AND D1 D5) (NAND (NOR (NAND D4
D1) (NOR (ADF0 D1 D4) (am D1 Ds))) (AND (OR D4 D3) (On
D3 D4) ))) (ANTD (ADFO D2 D3) (}IAND (NOR (}JAN]D D4 D1)
(ADFO D2 D3)) (OR D5 D1)))) (nam (ADFO Dl D4) (ADFO D5
D0))))) .
The average structural complexity, Switn, of the 1O0%-correct solutions
from the 18 successful runs (out of 28 runs) of the 6-symmetry problem is
78.8 points with automatically defined functions. The program above is
typical as to the size of the solutions produced by genetic programming
with automatically defined functions.
hr another run, the following lO0%-correctprogram of below-average size
(66 points) emerged on generation L3:
(progn (defun adf0 (ARGO ARGI)
(vAlues (NA}TD (OR (NAND (OR ARG]. ARGO) (AND ARGO ARGI))
(NOR ARGI. ARGI.)) (NAND (NAND (OR ARGO ARGO) (NAND
(AND ARG1 ARGO) (}TAND ARGO (OR (NOR (A}JD ARG1 ARGI)
(NOR ARG1 ARGI)) (}TAND (NOR (NOR ARGO ARGI) (OR ARG].
ARGO)) (AN]D ARG1 ARGI)))))) (NOR (OR ARG0 ARGI) (AIJD
ARG1 ARGO))))))
(values (AND (ADFO D5 D0) (AIJD (ADF0 D]- D4) (ADFO (ADF0
(AND D2 D2) n3) (ADFO D0 D0)))))).
In this Program the 49-point ADF0 computes the equivalence of its two
Boolean arguments. Then the t7-point result-producingbranch computes the
three-way conjunction of the equivalence of n 5 and D 0, the equivalence of o t
and D4, and the value retumed by the last nine points of this result-producing branch. This three-way conjunction is computed by the first two ANDs
appearing in the result-producing branch. The last nine points of the resultproducing branch,
(ADFO (ADFO (AND D2 D2 ) D3 ) (ADFO D0 D0 ) ) ,
compute the equivalence of oZ and o3 in a rather complicated way. Since
(ADFO D0 n0) isalwaystrueand (ADFO <<X>> T) isalways<<X>>,for
all <<x>>, the outer ADFO of this nine-point subexpression refums the value
of (ADFO (AND D2 D2) D3 ) which, in tum, simplifies to (ADFO D2 D3 ) .
129 Problems that straddle the Break-even Point for computational Effort
We can interpret this 66-point solution in terms of the top-dovrn way of
describing the hierarchical three-step problem-solving process described in
chapter 1. First, the overall problem of computing the 6-ryrmmetry function is
decomposed into the subproblem of finding the equivalence (nQv) of two
Boolean arguments. Second, ADF0 expresses the solution to this subproblem.
Third, the solution to the overall problem is assembled in the result-producing branch using the primitive AND function twice (the first two occurrences
of ewo in the result-producing branch) and five different instantiations of the
solution to the subproblem of finding the two-argument equivalence (anEo).
Although a human progralruner would undoubtedly be more efficient and
invoke ADFO only three times in solving this problem, the genetically evolved
program is 100%-correct.
The above solution to the 6-symmetry problem illustrates three of the five
ways itemized in chapter 3 by which the hierarchical problem-solving approach can be beneficial: hierarchical decomposition, parametrized reuse,
and abstraction.
First, the hierardrical decomposition is manifested by the fact that the overall
program for solving the problem consists of the 49-point automatically defined function ADFO and the L7-potnt result-producing branch.
Second, the five times that the result-producing branch is used to compute
the equivalence of its two dummy variables illustrate parametrized reuse of
the solution to the ADFO subproblem. Generalization comes from such parametrized reuse. ADFO is a general way of determining equivalence and may
be reused on any combination of Boolean values or expressions. ADFO is invoked five times by the result-producing branch with the following different
combinations of values for its two dummy variables:
o (ADFO D5 D0 ) ,
o (ADF0 D1 D4 ) ,
o (ADFO (AND D2 D2) D3 ) ,
o (ADFO D0 D0 ) , and
e (ADFO (ADF0 (AND D2 o2) n3) (ADF0 D0 D0)) .
Thfud, each time eop'O is invoked by the result-producing branch with two
particular combinations of values, abstraction is occurring. All actual variables of the problem that are not involved with that particular invocation of
ADFO are momentarily irrelevant. For example, when (ADF0 D5 D0 ) isbeing
evaluated,Dl-,D2, and n3 are all irrelevant.
Figure 5.L5 presents the performance curyes based on the 28 runs of the
problem of symbolic regression for 6-symmetry with automatically
defined functions. The cumulative probability of success, P(M,i) ,
is 61%
by generation 29 and rs 64o/o by generation 50. The two numbers in
the oval indicate that if this problem is run through to generation 29,
processing a total of E*i,,out= 2,400,000 individuals (i.e.,'1.6,000 x 30 generations x 5 runs) is sufficient to yield a solution to this problem with
99% probability.
130 Chapter 5
^ 100
CA
rnq)
I
9
)
a
b50
9)
.!l-(.-
,-
Cg
A
-
f.r
A -
-
0
With Defined Functions
Figure 5.1,5 Performance curves for the 6-symmetry problem showing that E*irp = 2,400,000
withADFs.
Thble 5.8 Comparison table for Boolean 6-symmetry problem.
Without ADFs WithADFs
\
(29,61Vo)
(II,llTo) 25
Average structural
complexity S
Computational effort E
143.0
4,369,000
78.8
2400,000
5,000,000
S
100
WithoutADFs WithADFs Without ADFs WithADFs
Figure 5.16 Summary graphs for the 6-symmetryproblem.
131 Problems that Straddle the Break-even Point for Computational Effort
5.2.1.5 Comparison with and without ADFs
Table 5.8 compares the average structural complexity, Switnout drtd Swithr
and the computational effort, Ewithout and E.,,0, for the Boolean 6-symmetry problem with automatically defined functions and without them. As
can be seen, the computational effort, Ewithrequired with automatically
defined functions is less than the computational effort, Ewithout, without
them. Thatis, automatically defined functions arebeneficialfor the scaledup version of this problem. In addition, the average structural complexity
is considerably less with automatically defined functions than without
them, so automatically defined functions are also beneficial as to the parsimony. However, as previously mentioned, we do not start seeing a reasonably consistent advantage in program size with automatically defined
functions until we start encountering the more difficult problems in chapter 6 and beyond.
Figure 5.16 summarizes the information in this comparison table and shows
a structural complexity ratio, Rs, of 1.82 and an efficiency ratio, Rr, of,by
coincidence,l.82. The fact that the efficiency ratio is greater than 1 indicates
that automatically defined functions are beneficial for the scaled-up version
of this problem.
5.2.2 The Boolean S-Symmetry Problem
In this subsection we state the comparable statistics for the symbolic regression of the simpler five-argument version of the Boolean symmetry problem.
The tableau for the S-symmetry problem is identical to the tableau for the
6-symmetry problem (except for the target function, the actual variables of
the problem, the population size, and the number of fitress cases) and will
not be shown here.
5.2.2.1 Results without ADFs
\,Vhen automatically defined functions are not used, the average structural
complexi t!, S.itno ur, of the best-of-run programs from the 37 4successful runs
(out of 375 runs) of the Boolean S-symmetry problem is 57 , points.
Figure 5.17 presents the performance curves based on these 375 runs for
the Boolean S-symmetry problem without automatically defined functions.
The cumulative probability of success/ P(M,l), is 9I%by generation L4
and is 99.7% by generation 50. The two numbers in the oval indicate that
if this problem is run through to generation 1"4, processing a total of Errtr,out
= 120,000 individuals (i.e.,4,000 x 15 generations x 2 runs) is sufficient to
yield a solution to this problem with 99% probability.
5.2.2.2 Results with ADFs
When automatically defined functions are used, the average structural complexity, Swith,of the best-of-run programs from the 60 successful runs (out of
62 rurrs) of the Boolean S-symmetry problem is 72.1 points.
1.32 Chapter 5
1,000,000
\
(50,99.7Vo
N
0
0
q)
c.)
I
!a
-
0
€H
.u
. I
-
.-
G
7-
o
L
A
H
rtU
o
a
a
o
I
L
A -
-
q)
A-
+.
a
-
!a
-
-
.-
a -
F
U
FI
I
-
rS
(a
0
c)
I
Ura
-
a
tsso
t
.-
-
a !
A
t,
cl
A
-
tr
A
E{
Without Defined Functions
500.000
(2,0.3Vo) Generation
Figure 5.17 Performance curves for the Boolean S-symmetry problem showing that
Ewithout = 120,000 withoutADFs.
With Defined Functions
(0,l.6Vo) 25
Generation
Figure 5.18 Performance curves for the S-symmetry problem showing that E*ir, = 216,000
with ADFs.
Thble 5.9 Comparison table for the Boolean S-symmetry problem.
Without ADFs WithADFs
;l
q)
o
(n
O
r(J
li A
t{
{.)
-
+)
ct)
-
-a
.-
-' I
I
T
l- P,M'l I
l+ rurn, i, z)l
I M=4poo I
I z=99%o I
I R(z)=/ |
I N=375 |
l- p,M,tl
l+ I(M, i' z)l
I M = 4oool
I z=99%o I
I R(z)=l I
lN=62 |
(17,8I7o)
Average structural 57.4
complexity S
Computational effort E L20,000
72.1.
216,000
133 Problems that straddle the Break-even Point for computational Effort
S
n
Without ADFs With ADFs Without ADFs With ADFs
Figure 5.L9 Summary graphs for the Boolean 5-symmetry problem.
Figure 5.L8 presents the performance curves based on these 62 mns of the
Boolean S-symmetry problem with automatically defined functions. The
cumulative probability of success, P(M,i), is 81%by generation 17 and is
97%by generation 50. The two numbers in the oval indicate that if this problem is run through to generationl7,processing a total of E*uo =216,000 individuals (i.e.,4,000 x 18 generations x 3 runs) is sufficient to yield a solution to
this problem with 99% probability.
5.2.2.3 Comparison with and without ADFs
Table 5.9 compares the average strucfural complexit!, S*rtnour and Srirt, and
the computational effort, Ewithout arrd E*u1r, for the Boolean S-symrnetryproblem with automatically defined functions and without them. The computational effort, Ewithrequired with automatically defined functions is greater
than the computational effort, Ewithoutrwithout them. Automatically defined
functions are not beneficial for the simpler version of this problem.
Figure 5.19 summarizes the information in this comparison table and
shows a structural complexity ratio, R5, of 0.80 and an efficiency ratio, Rs,
of 0.56. The fact that the efficiency ratio is less than 1 indicates that automatically defined functions are not beneficial for the simpler version of
this problem.
The simpler S-symmetry problem appears to be on the non-beneficial
side of the breakeven point for computational effort and the 6-symmetry
problem in on the beneficial side of this breakeven point. The 6-symmetry
problem is not, of course, a particularly challenging target function for
Boolean symbolic regression using genetic programming. Thus, this
apparent breakeven point is located in a rather unchallenging area of the
space of all possible Boolean functions. If this is true, then symbolic
regression of all but the simplest Boolean functions may benefit from
automatically defined functions.
5.3 THE FOUR.SINE VERSUS THREE-SINE PROBLEMS
This section considers the problem of symbolic regression of the threeterm expression
sin x * sin 2x + sin 3r
134 Chapter 5
Figure 5.20 Graph of sin x * sin 2r + sin 3x + sin 4x nthe interval l-n, +nf.
and the scaled-up version of this problem involving the four-terrn expression
sin x + sin2x + sin 3x + sin 4x.
These two target ftrnctions for symbolic regression are scaled in terms of
the number of harmonics of a sinusoidal function. The additional harmonic
appearing in the fourterm expression grves the four-term expression a slighfly
greater amount of potentially exploitable regularity than the three-term
expression.
5.3.L The Four-Sine Problem - sin.r + sin2x * sin3x + sin 4x
Figure 5.20 graphs sin x + sin 2x * sin3x + sin 4x inthe interval"l-n, +nl.
5.3.1.1 Preparatory Steps without ADFs
The preparatory steps for this problem are similar to the problems of floating-point symbolic regression of the sextic and quintic polynomials. However, the target function here is more difficult to learn than the sextic
polynomial. Experience indicates that we can expect to evolve only an
approximate solution to this target with our chosen population size of
4,000. We therefore stated the success predicate in terms of a cumulative
error of 1,3.000 over the 50 fitness cases. Given the range over which the
target function varies, this success predicate corresponds to an average
error of about 3"/" per fitness case.
Thble 5.10 summarizes the key features of the problem of symbolic regression for sin x * sin 2x + sin 3-r + sin 4x without automatically defined
functions.
5.3.1..2 Results without ADFs
hr one run without automatically defined functions, the following individual
satisfying the success predicate of this problem emerged in generation 43:
(% (? (% (% (* -0.2L0999 -0. 601L) (- (? 0.153397 x) (* x
0.25949))) (- (* x x) (n x 0.25949))) (- x x) I 1- (? 0.108994 x )
('k (- (* X X) (- X X) ) (- (* x (* (* ()k -0.210999 -0.507L\
135 Problems that Straddle the Break-even Point for Computational Effort
Table 5.10 Thbleau without ADFs for sin x + sin 2x + sin 3x + sin 4.r.
Objective: Find a program that produces the given value of
sin.x + sin 2,r + sin 3x * sin 4x as its output when
given the value of the one independent variable, r,
as input.
Terminal set
without ADFs:
x and the random constants fr*ur".
Function set
without ADFs:
r, -, * and Z.
Fibress cases: 50 evenly spaced values of X between -tr and zr.
Raw fibress: The sum, over the fitress cases, of the absolute value
of the error between ttre value retumed by the
program and the given value of the dependent
variable.
Standardized fibress: Same as raw fibress.
Hits: The number of fibress cases for which the absolute
value of the error is less than 0.05 (the hits criterion).
Wrapper: None.
Parameters: M=4,000.G=51-.
Success predicate: Aprogram has standardized fitness below 13.000.
Eq)
0
U)
o
I
o
li
A . Fl
o
A
t-
+.
ct)
-
te
-
-
Ea -
. I
F.
v
s
I
-
30,000,000
15,000,000
\
(50,47Vo)
-. I
u) U) q)
I
I
t
-
a
crr
>>
i).-
-
o -
A
-
c!
A
-
L
A .
-t
Without Defined Functions
0/25 (r8'2vo) Generation
Figure 5.21 Performance curves for sinx+sin2x+sin3x+sin4x showing that
E without = \ 472,000 without ADFs'
136
l- P,N'II
l+ I(M' i' z) |
I M = 4pool
I z=997o I
lnlz;=g I
I N=57 |
(45,46Vo) -
E = 1.472.0N
Chapter 5
Thble 5.11 Thbleau with ADFs for sin x * sin 2x * sin 3x + sin4x.
Objective: Find a program that produces the glven value of
sin r * sin 2x + sin 3x + sin 4r as its output when
given *re value of the one independent variable, x,
as input.
Architecture of the
overall program
with ADFs:
One result-producing branch and one one-argument
function-defining branch.
Parameters: Branch typing.
Terminal set for the
result-producing
branch:
X and the random constants fr""u,".
Function set for the
result-producing
branch:
r, -
,
* and ? and the one-argument defined
function ADF0.
Terminal set for the
function-defining
branch ADFO:
The dummyvariable ARGO and the random
constants Sr"rr".
Function set for the
function-defining
branch ADFO:
*, -,* and %.
-0.5213) (- (* (- (- (* 0.010498 0.79s5) (+ x (* 0.010499
0.7955))) x) (* (- (* x x) (* x 0.25949)) (* (- (* x x) (* x
0.25949)) (" x 0.2s949)))) (+ (+ X 0.301804) (+ X o.25s4s) ))))
(+ (- (? 0.L53397 X) ('k (- (* x x) (* X 0.25949)) (- (* x
0.25949) (+ (+ X 0.301804) (* 0.010498 0.1955) )))) (* 0.010498
0.7ess)))))).
The average strucfural complexi t!, S *itnout, of the best-of-run prograrns from
the 27 successful runs (out of 57 runs) of the problem of symbolic regression
for sinx * sin 2x +sin3x * sin 4x islll.Tpontswithoutautomaticallydefined
functions.
Figure 5.21 presents the performance curves based on these 57 runs of
the problem of symbolic regression for sinx + sin2x+ sin3x + sin4x without automatically defined functions. The cumulative probability of success, P(M,i), is 46%by generation 45 and is 47% by generation 50. The
two numbers in the oval indicate that if this problem is run through to
generation 45, processing a total of E*ithout = I,472,000 individuals (i.e.,
4,000 x 46 generations x 8 runs) is sufficient to yield a satisfactory result
for this problem with 99% probability.
5.3.1.3 Preparatory Steps with ADFs
we now consider the problem of symbolic regression of
sinx + sinZx + sin3x + sin4x using automaticalry defined functions.
137 Problems that straddle the Break-even point for Computational Effort
Table 5.12 Number of invocations of ADFO for sin x * sin 2x + sin 3r + sin 4"r.
Run Generation at which
satisfactory result emerged
Number of invocations of
ADFO
1
2
J
4
5
6
7
8
9
10
11
12
13
t4
15
T6
17
18
19
50
u
30
38
24
30
21
?2
40
39
L8
14
22
24
23
28
33
40
26
2
7
0
1
1,
6
1
1
3
t
3
2
1
2
1
3
2
8
1.
The preparatory steps for this problem are similar to those of the problem
of symbolic regression of the sextic polynomial.
Tiable 5.11 summaizes the key features of the problem of symbolic regression for sin x + sin2x + sin 3x + sin 4x using automatically defined functions.
5.3.1.4 Results with ADFs
We made 37 runs of the problem of symbolic regression for
sin x + sin2x + sin 3.;r + sin4x, of which 19 were successful.
Table 5.L2 shows the number of invocations of alr 0 for these L9 successful
runs. The number of invocations ranges from zero to eight, the average being
2.42.The result-producing branch ignores ADFO only in one of these L9 runs
(run3) indicatingthatthereis a strongcompetitive advantage associatedwith
automatically defined functions. Curiously, none of the 19 runs employ exactly four automatically defined functions.
In run 1, the following 104-point program emerges on generation 50. It
invokes ADFO twice, scores 22 (out of 50) hits, and has a standardized fihress
of L1".L1.
(progn (defr-m ADFO (ARG0)
(values (- (- (e" (Z -0 .4954 0.6199) (U 0.6342 0.40179))
(* (" (* ARGO ARG0) (e" (_ (_ (_ _0.3338 ARG0) (Z ARG0
138 Chapter 5
Figure 5.22 Comparison of target curve and best-of-run program of run t for
sin -r + sin 2.r + sin 3x * sin 4r.
0.7341)) (* ARc0 -0.315002)) (* ARGO ARGO))) (* (* ( -
(% (% -0.4954 0.586) (+ ARG0 (+ ARG0 0.6199))) (* ( -
0.82939 ARG0) (* (- (* (* (- 0.82939 ARGO) ARGO) (*
(% -0 .4954 0.586) ARG0) ) 0.071304) (A (- (- (% ( Z
_0.4954 0.586) (+ ARGO ARGO)) (% _0.4954 0.586)) ( *
ARGO ARG0)) (+ (- 0.071304 0.332s04) (+ 0.76689
-0.332) ))))) (+ ARGO 0.07899s)) (* ARG0 ARG0)))) ( +
ARGO 0.078995))))
(values (? (Z (+ (+ X X) X) (ADF0 (" X X) )) (ADF0
-0.0867)))).
Since this program's standardized fitress satisfies the success predicate of
this problem (requiring a standardized fitress of 13.000 or better) and is better than any other program in this series of 37 nrns, this program is the bestof-all program for these runs.
Figure 5.22 compares the target curye with this best-of-all program over
the intervall-n, n]. As canbe seerL this program tracks the target curve very
well for many parts of the interval, but deviates in the areas where the target
curve oscillates near the x-axis.
In run 2, the following program from generation 17 invokes ADFO seven
times, has a standardized fitress of !2.88, scores 16 hits, and satisfies the success predicate of the problem:
(progn (defun ADF0 (ARG0)
(values (- (* ARGO ARG0) (- 0.08439636 0.304401-4))))
(values (+ (% x (ADF0 (ADFO (* X x) ))) (% (ADFO (ADFO
-0.0s509901)) (% (Z (ADF0 x) (+ -0 .244201,66 ( +
0.9324951 0.5493927))) (* (+ (? x (ADF0 (* x (* X (ADFO
(* x (+ 0.33459473 0.1951-0078))))))) (% (* (- x x) x )
(8 x (+ 0.9324951- 0.5493927)))) (* x
0.1_es10078) )))))) .
Problems that Straddle the Break-even Point for Computational Effort
^ 1
0
(n
0)
I
I
FI
o
CH
.FJ
o -
-
.-
-.
rl
!
A
l-l
With Defined Functions
7,000,000
(r4,3Va) Generation
Figure 5.23 Performance curves for sin x * sin 2x f sin 3x + sin 4x showing that
Ewith = 1,148,000 with ADFs'
When ADF 0 is invoked only once, the problem environment is being
decomposed, but it is not being decomposed in a way that results in any
reuse. Run 4 illustrates this. The program below from generation 38 invokes
ADFO once, has a standardized fitness of L2.68,and satisfies the success predicate of the problem:
(progn (defun ADFO (ARGO )
(values (- (Z (* (+ 0.37509863 (* (* (+ 0.37609863 ( *
(* (+ 0.37609863 ('( (+ (* ARG0 ARGO) (? 0.A5239868
ARGO) ) ARGO) ) ARGO) ARGO) ) ARGO) ARGO) ) ARGO) (% (z
(- -0.1-0990143 ARG0) (+ 0.4028015 0.9240036)) ARG0))
(% 0.05239868 (+ ARGO (+ 0.37609863 (* (+ (* ARG0
ARG0) (% 0.05239868 ARGO)) ARG0)))))))
(values (eo (- (* (- X -0
. 06840515 ) (* (* (- (% -0 .2915006
x) x) (- 0.78369L4 (- 0.783591-4 -0.04L404724) ) ) x) ) x )
(ADF0 (* (* x x) (- 0.78369L4 (- x x) )))))) .
The average structural complexitf, lwith, of the best-of-run programs
from the L9 successful runs (out of 37 runs) of the problem of symbolic
regression of sin r * sin 2x * sin 3x + sin 4x rs 85.7 points with automatically
defined functions.
Figure 5.23 presents the performance curves based on these 37 runs of
the problem of symbolic regression for sinx+ sin2x +sin3x+sin4x with
automatically defined functions. The cumulative probability of success/
P(M,i), is 49%by generation 40 and is 5L% by generation 50. The fwo
numbers in the oval indicate that if this problem is run through to generation 40, processing a total of Eruo = 1,148,000 individuals (i.e., 4,000 x
41 generations x 7 runs) is sufficient to yield a satisfactory result for this
problem with 99% probability.
14.000,000
O
a
0
q)
I
h
rl,
-l
a
-
+)
ta
-
-
i
-
rl
.-
-
U
-
t
-
A0.49Vo\ i
\i
E = 1.149.000
PM,i)
a- I(M, i, z)
L40 Chapter 5
Thble 5.13 Comparison table for sin x * sin 2x + sin 3"r + sin 4r.
Without ADFs With ADFs
Average structural
complexify S
Computational effort E
1r1,.7
1.,472,000
85.7
1,149,000
2,000,000
E
1,000,000
0
Without ADFs With ADFs Without ADFs
Figure 5.24 Summary graphs for sin x + sin 2x + sin 3x + sin4x.
WithADFs
5.3.1.5 Comparison with and without ADFs
Genetic programming sometimes produces algebraically correct solutions
to the problems of the quintic and sextic polynomials. Howeve{, the results
produced by genetic programming for this problem are more typical of
more complex problems in that the results are good approximations to the
target curves, but not algebraically correct solutions. For example, the two
results exhibited above from generations 50 andIT do not resemble a sum
of four harmonics. They are, however, reasonably good approximations
to the target curve.
After seeing the above results from generations 50 and 17, the reader
may be straining to see any evidence of any regularity or modularity when
automatically defined functions are used. The result from generation 50
does exhibit some reuse (two invocations of ADFO) and the result from
generation 17 does exhibit some modularity (i.e., the problem is broken
into two parts). However, the amount of reuse is minimal and the modularity is completely mystifying. Moreover, the meager regularity and the
mystifying modularity are not grounded on the fact that the problem
actually involves four harmonics.
The reader may consequently be wondering whether there is any evidence that such meager regularity and modularity is beneficial. The answer
is that evidence comes in the form of the performance statistics with and
without automatically defined functions.
Thble 5.13 compares the average strucfural complexitf, Switnoy1 dttd }r,vith,
and the computational effort, Ewithout and Er,,r, for the problem of symbolic
regression for sin r + sin 2x + sin 3x + sin 4x with automatically defined functions and without them. As can be seen, automatically defined functions are
beneficial for the scaled-up version of this problem.
Problems that Straddle the Break-even Point for Computational Effort
1.42
Figure 5.25 Graph of sin x * sin 2x + sin 3x in the interval [-n, +nl'
Figure 5.24 suilnarizes the information in this comparison table and shows
a structural complexity ratio, Rs, of 1.30 and an efficiency ratio, Rs, of t.28.
The fact that the efficiency ratio is greater than f. indicates that automatically
defined functions are beneficial for the scaled-up version of this problem.
Indeed, this problem makes the important point that genetic programming does not produce results in the style of a human programmer. However meager the regularity and however mystifying the modularity,
automatically defined functions have demonstrably extracted something
beneficial from this problem environment as evidenced by the fact that
they reduced the number of fitness evaluations required to yield a satisfactory result for this problem with 99% probability.
5.3.2 The Three-Sine Problem - sinx + sin 2x + sin3r
hr this subsection we state the comparable statistics for the symbolic regression of the simpler three-expression target function sin x + sin 2x + sin 3x .
Figure 5.25 graphs sin x + sin 2x * sin 3x in the interval [-r, +n].
The tableau for the three-sine version of this problem is identical to the
tableau for the four-sine version (except that the target function is
sin x + sin 2x + sin 3x ) and will not be shown here.
5.3.2.L Results without ADFs
When automatically defined functions are not being used, the average
structural complexity, S*itnort, of the best-of-run Programs from the 46
successful runs (out of 48 runs) of the problem of symbolic regression for
sin x + sin 2x + sin 3x is 86.0 points.
Figure 5.26 presents the performance curves based on these 48 runs of the
problem of ryrmbolic regression for sin x * sin 2x + sin 3x without automatically defined functions. The cumulative probability of success, P(M,i),is92"/"
by generation 35 and is96%by generation 50. The two numbers in the oval
indicate that if this problem is run through to generation 35, processing a total
Chapter 5
(35,92Va)
Without Defined Functions
5,000,000Eq)
(t)
(A
o
I
h
A ,
-t
q)
A
-
*a
(t)
-
ct
F.-
'tg
.-
. l
T
A
I
-
\
(50,96Vo)
2,500,000
r' a /v'
Generation
Figute5.25 Performancecurvesfor sinx+sin2x+sin3x showingthat Ewithopl =288,000
withoutADFs.
With Defined Functions
5,000,000
(50,887o)
2,500,000
25
Generation
Figwe 5.27 Performance curves for sin x * sin 2x + sin 3x showing that Ewith = 324,000
with ADFs.
I'iable 5.L4 Comparison table for sinx * sin 2x * sin3x.
WithoutADFs WithADFs
-l
q) (n(n
C) cJ
k
A
-
q)
*)
a
-
-
FI15o l
a l
-' U
a
t
f-l
NU
a
0
q)
I
I
FI
a
a
tsso
h
I
.-
-
.-
A
,-
c6
-
o
li
A .
-
E = 324.000
(4,2Vo\
Average structural 86.0
complexity S
Computational effort E 288,000
78.7
324,000
1,43 Problems that Straddle the Break-even Point for Computational Effort
s
s
WithoutADFs WithADFs Without ADFs With ADFs
Figure 5.28 Summary graphs for sin r * sin 2x + sin3x .
of E*i6ou1= 288,000 individuals (i.e.,4,000 x36 generations x 2 runs) is sfficient
to yield a satisfactory result for this problem with 99"h probability.
5.3.2.2 Results with ADFs
When automatically defined functions arebeingused, the average structural
complexity Swith, of the best-of-run programs from the 23 successful runs
(out of 26 runs) of the problem of symbolic regression for sin x * sin 2 x + sin 3 x
is78.7 points.
Frgure 5.27 presents the performance curves based on these 26 runs of the
problem of symbolic regression for sinx + sin2x + sin3x with automatically
defined functions. The cumulative probability of success, P(M,i),is79%by
generation 26 and is 88% by generation 50. The two numbers in the oval indicate that if this problem is run through to generatton26,processing a total of
E.ith = 324,000 individuals (i.e., 4,000 x27 generations x 3 runs) is sufficient
to yield a satisfactory result for this problem with 99%probability.
5.3.2.3 Comparison with and without ADFs
Table 5.14 compares the average structural complexrty, S*irnoul and Srtn, artd
the computational effort, Ewithout and E*u,, for the problem of symbolic regression for sin r * sin 2x + sin3"r with automatically defined functions and
without them. As canbe seen, automatically defined functions are notbeneficial for the simpler version of this problem.
Figure 5.28 summaflzesthe information in this comparison table and shows
a structural complexity ratio, Rs, of 1.09 and an efficiency ratio, Ru, of 0.89.
The fact that the efficiency ratio is less than 1 indicates that automatically
defined functions are not beneficial for the simpler version of this problem.
The performance statistics again indicate that the simpler version
of this problem is on the non-beneficial side of an apparent breakeven
point for computational effort whereas the scaled-tp version is on the
beneficial side.
5.4 FOUR OCCURRENCES VERSUS THREE OCCURRENCES OF A
REUSABLE CONSTANT
When a particular constant value is needed in more than one place in a computer pro gram,human prografiuners typically I et some variable be the value
L44 Chapter 5
145
Table 5.15 Thbleau without ADFs for x / n + x2 I n2 + 2ttx .
of some expression. A let can be viewed as an automatically defined function that takes no explicit arguments.
This section considers the problem of symbolic regression for the two-term
target expression x / n+ *2 | n? and the scaled-up version of this problem
involving the three-term expression x I n + *2 | n2 + Znx for values of the independent variable x in the interval [0.5, 10.0].
These two target functions for symbolic regression are scaled in terms of
the frequency of use of the constant rc in the two expressions. There is a slightly
greater amount of regularity in the three-term expression because fi apPears
four times in the three-term expressiory but only three times in the two-term
expression.
We have seen (Genetic Programming, sections 10.2 and 10.11) how symbolic regression can be used to evolve a constant value. The goal in this
section is to evolve a zero-argument automatically defined function (a let)
that returns a constant value that can be used in multiple places in a main
program.
5.4.1. Three-TermExpression x/n+xz lnz +Znr
We first consider the problem of evolving a program for the three-term
expression, x I rT + *2 I n2 +Zlrx,in which 7[ appears four times.
Problems that Straddle the Break-even Point for Computational Effort
Objective: Find a program that produces .r / n + *2 / nz + 2rx
as its output when given the value of the one
independent variable, x, as input.
Terminal set
without ADFs:
X and the random constants 91."u,,.
Function set
without ADFs:
r, -,* and %.
Fihress cases: 10 random values of x between 0.5 and 10.0.
Raw fitness: The sum, over the fitness cases, of the absolute value
of the error between the value returned by the
program and the given value of the dependent
variable.
Standardized fihress: Same as raw fitness.
Hits: The number of fitress cases for which the absolute
value of the error is less than 0.05 (the hits criterion).
Wrapper: None.
Parameters: M = 4,000. G = 51. Different fibless cases are chosen
for each run.
Success predicate: Aprogram scores the maximum number of hits.
l- P,M'il I t
l+ I(M, i, z) | |
- l
- |
lY=ri.# | !
lRlz;=15 |
lN=45 |
(l3,2Vo)
I
V
,lJ
q)
a
a
€)
I
o
L
n{
(l)
A
I
€
(A
-
c!
II
-
16
.-
. l
EI
-
I
rs
0
0
o)
I
I
-
J
0
CH
h
P
.-
-
.-
A
-
6UA
-
L
I
Without Defined Functions
14,000,000
7,000,000
(50,27Vo)
25 50
Generation
Figure 5.29 Performance curves for x /n+*2 lnz +2m showing that E*i,6ou, =
3,000,000 without ADFs.
5.4.1.1 Preparatory Steps without ADFs
The preparatory steps for this problem without automatically defined
functions are similar to the previous problems of floating-point symbolic
regression.
Table 5.15 summarrzes the key features of the problem of symbolic regression of x I n + ,2 | n2 +2nx without automaticallv defined functions. r'
5.4.1.2 Results without ADFs
In one run without automatically defined functions, the following
95-point best-of-generation program scoring 10 hits (out of 10) emerged in
generation 39:
(+ (- (- (- (% x x) (* 0.062698 (% (+ -0.140305 (% (+ (- x x) ( %
x x) ) x) ) 0.8454))) (* x -0.8292)) (- (* (- -0.754L96 -0.6319)
(* X X) ) (% (+ X -0.135101) (* -0.5475 -0.3238)))) (* (% ( +
-0.14030s 0.4821) (- (% (- (8 x x) (* 0.062698 (Z 1- x x )
0.8454))) (+ -0.3225A2 x)) (% (eo (eo (+ (- -0.3107 0.8454) ( *
-0.926 -0.15991) (% (- -0.j541,96 -0.6319) (+ x x) )) (* x x; ; q *
-0.5475 -A.3238)))) (- (" -0.163101_ 0.9323) (* (- -0.1541,96
-0.6319) (* x x) )))).
Of course, the random constants are intermixed throughout this expression in this program.
The average structural complexit!,Swrtnour, of the best-of-run programs
from the 12 successful ryns (out of 45 runs) of the problem of symbolic
regression for xln+*2 lnz +2ta is 86.6 points without automatically
defined functions.
Figure 5.29 presents the performance curves based on these 45 runs of
the problem of symbolic regression for xln+x2 / n2 +zrx without
Chapter 5
Thble5.L6 ThbleauwithADFsfor x I n+ 12 I n2 +Zrx.
Objective: Findaprogramthatproduces x I n+ *2 | nz +Zrx
as its output when given the value of the one
independent variable, x, as input.
Architecture of the
overall program
withADFs:
One result-producing branch and one zero-argument
function-defining branch.
Parameters: Branch typing.
Terminal set for the
result-producing
branch:
x and the random constants S,"ur".
Function set for the
result-producing
branch:
*, -, * and % and the zero-argument defined function
ADFO.
Terminal set for the
function-defining
branch ADFO:
Random constants fr,."*. (There are no dummy
variables inADFO for this problem.)
Function set for the
function-defining
branch ADFo:
r, -, * and %.
automatically defined functions. The cumulative probabitity of success,
P(M,i), is sfill27% at generation 49 and is 27%by generation 50. The two
numbers in the oval indicate that if this problem is run through to generation49,processing a total of Ewithout = 3,000,000 individuals (i.e.,4,000 x 50
generations x 15 runs) is sufficient to yield a satisfactory result for this
problem with 99% probability.
5.4.1,.3 Preparatory Steps with ADFs
The principle being illustrated by this problem is to evolve an automatically defined function that returns a constant value that can be used in
more than one place in a main program. Consequently, given our intent
for this problem, the terminals in the terminal set of apr'0 are restricted to
random constants. ADFO takes no arguments and has no access to x, the
actual variable of the problem. O.ly the result-producing branch has access
to x. If the constant that evolves in ADFO is useful, it will be invoked
repeatedlyby the result-producingbranch. This repeated use of aor'O should
result in some observable increase in efficiency in solving the problem.
Thble 5.L6 summarizes the key features of the problem of symbolic regression for x / n + x2 | n2 +Zrw with automatically defined functions.
Since ADFO contains no variables and therefore always evaluates to the
s€une constantfor everyfibress case, a considerable amountof computer time
can be saved in this problem by evaluating the program tree in ADFO once
and caching the value obtained.
Problems that straddle the Break-even Point for Computational Effort
t48
Figure 5.30 Best-of-run program scoring 10 hits from generation 44 for the symbolic regression
of x / ft+ x4 / n' +2nx withADEs.
5.4.1,.4 Results with ADFs
When automatically defined functions are used, genetic prograrnming found
the following 42-point program scoring L0 hits (out of 10) in generation4of
one run:
(progn (defun ADF0 o
(values (* (+ -0.8355 (+ (* 0.65849 -0.206299) (+ ( *
0.6s849 -0.206299) -0.7719) ) ) (* 0 .290802
0.492493) ) ) )
(values (+ (+ (+ (- ('( (ADFO) (* (+ X -0 .3852) 0.260505) )
(% x -0.1_391-98)) (* x (ADF0))) (* x (ADF0))) (* (* x
(ADFO)) (* x -o .3862) )))) .
Figure 5.30 shows this best-of-run program as a rooted, point-labeled tree
with ordered branches.
ADFO is invoked four times in this evolved solution. The target expression
x / n + x2 | n2 +2nx contains four occurrences of the constant n. The resultproducing branch of this evolved program closely matches the target expression. Howevel ADFO evaluates to 4.269115.
What happened to n? As is usually the case, the style of the program evolved
by genetic programming is nothing like the style of a program written by a
human progranuner. A human progranuner would, of course, notice the four
appearances of the common constant n in the three-term expression
x I n+ x2 | n2 +2ttx and would probably write a let that bound the constant value 3.L4I59 to some named variable (perhaps called er). The program with the 1et is simpleq, more understandable, and more efficientbecause
the common constant can be computed once and then reused in several different places in the program. Genetic programming also uncovers a regularity consisting of a constant in this problem environment; however, the
regularity that it discovers and reuses is -0.269115, not 3.1,4159.
Chapter 5
Thble 5.L7 Constant values evolved in ADFO in 10 runs of the symbolic regression of
x I n+ x2 | n2 +2nx.
Run Constant value evolved
in ADFO
Number of invocations
ofanro
1
2
5
4
5
6
7
8
9
10
4.2691.
4.1.066
4.2117
74.5470
4.5752
1.7371,
7.7535
0.2238
-0.8334
-99.8550
4
3
9
9
6
5
25
6
14
7
1 14,000,000
-c\
(n
a
0) (Jr.)
I
(a
C1r1
h
+)
. E
-
. l
A
-
-
A
-
tr
A . -
With Defined Functions
7,000,000
(50,28Vo)
0
Generation
Figure 5.31- Performance curves for x I n + *2 | nz + 2nx showtng that Ewith =2,2g0,000
withADFs.
-
0) u)
rn
c)
I
tr
A ,
Fl
c)
I
(n
-
CB
t
-
rtl
. T
F.
t--
-
(37,28Vo)
(L2,3Vo)
1.49 Problems that straddle the Break-even point for computational Effort
Without ADFs WithADFs
Table 5.18 Comparisontable for x I n+ *2 | n2 +Lrx.
Average structural
complexity S
Computational effort E
86.6
3,000,000
94.3
2280,000
s
s
4,m0,000
Without ADFs With ADFs Without ADFs
Figure5.32 Summarygraphs for x I n+ *2 / nz +Zrx.
WithADFs
Thble 5.17 shows the constant value evolved for the 10 successful runs of
the problem of symbolic regression for x I r + *2 / n2 +2rx and the number
of invocations of aor'0 for each run. In each instance, the evolved constant is
called repeatedly (with thenumber of invocations of the constantvalue ranging between 3 and 25). Howevet the evolved constant is not 3.14159 it any
instance. Moreove{, there are precisely four invocations of ADFO in only one
of the L0 runs.
The average strucfural complexity, S*i,n, of the best-of-run programs form
the L0 successful runs (out of 36 runs) of the problem of symbolic regression
for x I n + *t I n2 +2tcx is94.3 points with automatically defined functions.
Figure 5.3L presents the performance curves based on these 36 runs of
theproblemof symbolicregressionfor x I n+ 12 | n2 +2rx withautomatically defined functions. The cumulative probability of success/ P(M,i), is
28%by generation 37 and is 28%by generation 50. The two numbers in
the oval indicate that if this problem is run through to generationS7, processing a total of Eru, =2,280,000 individuals (i.e.,4,000 x 38 generations
x L5 runs) is sufficient to yield a satisfactory result for this problem with
99% probability.
5.4.L.5 Comparison with and without ADFs
Table 5.18 compares the average strucfural complexlty, S*itnout arrtd Swrth,
and the computational effort, E*ithout and E*u7, for the problem of symbolic regression for x I n + xz / nz +Ztw, both with automatically defined
functions and without them. As can be seen, automatically defined functions are beneficial for the scaled-up version of this problem.
Figure 5.32 summarizes the information in this comparison table and
shows a structural complexity ratio, R5, of 0.92 and an efficiency ratio,
150 Chapter 5
Eq)
a
.(a
o
I
li
A ,
-
q)
,-
tr)
u2
-
6c
I
-
r3a:l
a -
-
-
-
Fi
1
(A(n
q)
I
I
t
-
u) *{
h€. l
-
.-
A
-
G'
A
-
!.4 A
H
Without Defined Functions
7,000,000
\
(50,95Vo)
3,500,000
(rl'4'5vo)
Generation
Figure 5.33 Performance curves for .r / n + *2 / nz showing that E*i,1rout = i44,000
withoutADFs.
Rs, of L.32. The fact that the efficiency ratio is greater than 1 indicates that
automatically defined functions are beneficial for the scaled-up version of
this problem.
5.4.2 The TWo-Term Expression x / n + *2 | nz
hr the simpler version of this problem, the target invokes n only three times.
The tableau for this version of this problem is identical to the tableau for the
previous version (except that the target function is x / ft + *2 / n') and will
notbe shown here.
5.4.2.1 Results without ADFs
In one run without automatically defined functions, the following 41-point
satisfactory result scoring L0 hits (out of 10) emerged in generation 14:
(* (+ 0.311-905 X) (* 0.1,26205 (+ (- (- (+ (+ 0.311905 X )
0.633194) (* 0.106995 (* (+ (- (- (+ 0.311-905 (- X 0.091-904)) ( *
0.126205 (+ (- x 0.09L904) 0.633L94)) ) 0.091904) 0.6331-94) ( *
0.L26205 (_ x 0.00639343))))) 0.0e1e04) 0.6331_94))).
As before, the random constants are intermixed throughout this expression
in this best-of-run program.
The average structural complexit!,\witnour, of the best-of-run programs
from the 21 successful runs (out of 22 runs) to the problem of symbolic
regression for x / n + xz I nz is 60.6 points without automatically defined
functions.
Figure 5.33 presents the performance curves based on these 22 runs of
the problem of symbolic regression for x | rc + x2 | nz without automatically defined functions. The cumulative probability of success, P(M,i ), is
Problems that Straddle the Break-even Point for Computational Effort
(42,91Vo) ---+
42 E = 344.000
151
-
q)
0
(A()
I
L
A ,
-
q)
+a
(A
-
-
/
-
€.-
.Fl
F
v
/
E
-
100
. G
a
(aq)
I
I
t
a
ts50
>>
*a
-
. T
-a
A
|r
A
-
With Defined Functions
7,000,000
(50,62Vo) r'
3,500,000
25
Generation
Figure5.34 Performancecurvesfor xI n+x2 /z2showing that E.i,p=864,000withADFs.
91%by generation 42 and is 95% by generation 50. The two numbers in
the oval indicate that if this problem is run through to generation{2,processing a total of Erurout = 344,000 individuals (i.e., 4,000 x 43 generations
x 2 runs) is sufficient to yield a satisfactory result for this problem with
99% probability.
5.4.2.2 Results with ADFs
In one run with automatically defined functions the followtng76-point program scoring L0 hits (out of L0) emerged in generation 8:
/nrnnn
\I/!vYrr\gv!gla
/rlofrrn ADFQ o
(values (- (+ (* (- -0.5931 0.916) (+ -0.9225
-0.0545044) ) (+ (* 0.170105 -0.1652) (* 0.7413
0.3s8994))) (* (* (Z -0.9369 0.r3420L) (- 0.442
0.8804)) (- (* (+ -0.223602 0.9648) (+ -0.00400s43
0.6232)) (+ 0.2592 0.7084e6))))))
(values (? (- (* (+ X (ADFO)) (% x (ADFO))) (Z (- (- (* x
0.843) (* x -0.3687)) (% X x) ) (+ (+ (ADFO) (ADF0))
(ADFO)))) (- (+ (- x x) (+ (ADF0) x) ) (* (- (ADF0)
(ADFO)) (- -0.3031 x) ))))).
In this best-of-run program ADFO evaluates to 3.27L7. ADFO is called six
times to provide this constant value for use in the calculation and two more
times to produce azero.
The average structural compledty, Swith,of the best-of-run programs from
the 21 successful runs (out of 34 runs) to the problem of symbolic regression
for x I n + *' / n' is73.4points with automatically defined functions.
Figure 5.34 presents the performance curves based on these 34 runs of
the problem of symbolic regression for x / n + x2 / n2 with automatically
defined functions. The cumulative probability of success, P(M,i), is 42o/"
Chapter 5
50
r52
Thble 5.19 Comparison table for x / n + *2 I n2.
WithoutADFs WithADFs
Average strucfural
complexity S
Computational effort E
60.6
34/,000
73.4
864,000
1,000,000
E
500,000
0
Without ADFs With ADFs Without ADFs With ADFs
Figure 5.35 Summary graphs for x I n + x2 I nz .
by generation 26 and is 62% by generation 50. The two numbers in the
oval indicate that if this problem is run through to generation26,processing a total of Ewith = 864,000 individuals (i.e., 4,000 x 27 generations x 8
runs) is sufficient to yield a satisfactory result for this problem with 99%
probability.
5.4.2.3 Comparison with and without ADFs
Thble 5.L9 compares the average structural complexTtf, Swithoy: arrrd Swith,
and the computational effort, E.ithout and E*,,r, for the problem of symbolic
regression for x / n + *2 / 7c2 ,with automatically defined functions and without them. As can be seen, automatically defined functions are not beneficial
for the simpler version of this problem.
Figure 5.35 summarizes the information in this comparison table and
shows a structural complexity ratro, R5, of 0.82 and an efficiency ratio,
Rr, of 0.38. The fact that the efficiency ratio is less than f. indicates that
automatically defined functions are not beneficial for the simpler version
of this problem.
Again we see that the simpler version of this problem is on the non-beneficial side of an apparent breakeven point for computational efforf however,
the scaled-up version is on the beneficial side.
5.5 SUMMARY
Thble 5.20 compiles the observations from the L6 experiments in this chapter
into one table. The first four rows apply to the simpler version of each
problem and the last four rows apply to the scaled-up version of each
problem. As can be seen, for the simpler versions of each of the problems,
S
4
153 Problems that Straddle the Break-even Point for Computational Effort
Thble 5.20 Summary table of the structural complexity ratio, R5, and the efficienry
ratio, RB,for the simpler version and the scaled-up version of the four problems in
this chapter.
the efficiency ratio is less than 1 (indicating that fewer fitness evaluations
are required to yield a satisfactory result for the problem with a99"/"probability without automatically defined functions than with them). As in
the simple two-boxes problem, automatically defined functions are not
beneficial for these simpler versions. However, for the scaled-up versions
of the four problems, the efficiency ratio is greater than 1" (indicating more
computational effort is required without automatically defined functions
than with them). That is, automatically defined functions are beneficial
for these problems. These four groups of problems straddle a breakeven
point for computational effort.
Figure 5.36 is a bar chart summarizing the values of the efficiency ratio,
RB, for the simpler version (gray bars) and the scaled-up version (olack bars)
of the four problems shown in table 5.20. The black bars representing the
scaled-up versions of the problems are on the beneficial side of the breakeven
point for computational effort for all four problems.
Automatically defined functions exhibit no consistent effect on the
average structural complexity of the evolved programs for the four problems in this chapter; however, they do exhibit a reasonably consistent
advantage as toparsimony for the more difficultproblems encountered in
later chapters.
Problem Structural complexity
ratio R5
Efficiency ratio
RE
Hi.'g"iTomiar
t.07 0.33
Boolean S-symmetry 0.80 0.56
Three-sines
sinx+ sinZx +sin3x
t.w 0.89
TWo-term xln+xz 1fiZ 0.82 0.38
TrSx%'Tomiar 0.98 t.22
Boolean 6-symmetry r.82 1.82
Four-sines
sin x * sin2x + sin 3x + sin 4x
1.30 1_.28
Three-term
xln+xz lnz +2m
0.92 t.32
154 Chapter 5
A
F
FI€)
-
A
-
A
v
Ll
A .
-
Quintic p^olynomial
Ja3
x -zx q X
Sextic polynomial
6^42
x -zx q X
Boolean S-symmetry
Boolean 6-symmetry
Three-sines
sinr4stnZxasin3x
Four-sines
sin .r a srn2x a sin 3.r a. sin 4.r
Two-term L *+
7T 7C
Three-term L *+ +2nx
7t 7E
1
Efficienc] ratio R6
Figure 5.35 The efficiency ratto, RB, of the scaled-up versions of all four problems in this
chapter are greater than L.
155 Problems that Straddle the Break-even Point for Computational Effort
Boolean Parity Functions
Genetic program-i^g with automatically defined functions did not exhibit
any advantage in the two-boxes problem (chapter 4) in terms of the average suze of the evolved solutions or the computational effort when compared to genetic programming without automatically defined functions.
For the simpler versions of the four problems presented in chapter 5,
automatically defined functions demonstrated no advantage in terms of
fitness evaluations; however, automatically defined functions were
advantageous for the scaled-up versions of each of the four problems. Thus,
there appears to be a breakeven point for computational effort for these
four problems; automatically defined functions facilitate problemsolving on the beneficial side of that breakeven point.
The problem of symbolic regression of the Boolean even-parity function
considered in this chapter is distinctly on the beneficial side of the breakeven
point for computational effort.
Parity problems are scaled in terms of the number of their arguments (arity,
order). The chapter begins by stating the problem of leaming the parity function. Multiple function-defining branches and hierarchical automatically
defined functions are introduced.
We then establish a baseline for the computational effort, Ewithout, without
automatically defined ftnctions and the average structural complexity, Swithout,
without automatically defined functions for the even-3-,4-,5-, and 6-parity
problems using a population size of 16,000. Then, values of E*uo and S,"i,r,
are obtained with automatically defined functions for the same progression
of parity problems with the same population size. The efficiency ratto, RB,
and the structural complexity ratto, Rs, ore then computed for the parity problems in this progression.
The advantages of automatically defined functions are further demonstrated
by solving the even-parrty problems of order 7 , 8, 9 , 10, and 11 with a population size of only 4,000.
6.I THE EVEN-PARITY PROBLEM
The Booleaneaen-k-parityfunctionofkBoolean arguments returns r if an even
number of its Boolean arguments are T, and otherwise refums NrL.
Thble 6.1 Truth table for Boolean even-3-parity function.
Fifiress case D2 D1 DO Even-3-parity
0
t
2
3
4
5
6
7
NIL
NTL
NIL
NIL
T
T
T
T
NIL
NTL
T
T
NIL
NIL
T
T
NIL T
T NIL
NIL NIL
T T
NIL NIL
T T
NIL T
T NIL
d2
d1
d0
Output
Figure 5.L Boolean even-3-parity function with inputs of L,1., and 0 an output of 1,.
Figure 6.1" shows that the ouQut of the even-3-parity function is L for inputs
of L (representing T),1, and 0 (representing NIL).
A Boolean function can be represented and fully defined by its truth table.
Table 6.1 shows the truth table for the even-3-parity function. Each row of the
tmth table corresponds to one of the eight fibress cases for this problem.
Parity functions are often used to check the accuracy of stored or transmitted binary data in computers because a change in the value of any one of its
arguments toggles the value of the function. Because of this sensitivity to its
inputs, the parity function is difficult to learn and is often used as a benchmark in the fields of machine leaming and neural networks.
Throughout this book, we use the widely-used numbering scheme for identifying Boolean functions wherein the values of the function for the 2k combinations of its k Boolean arguments are concatenated into a 2k -bit birury
number and then converted to the equivalent decimal number. For example,
the values of the even-3-parity function for the 23 = 8 combinations of its three
inputs are 0, L,"1.,0,1.,0,0, and L (reading table 6.1 up from the finress case
consisting of three T values to the fibress case consisting of three url values).
Since 011010012 = 10510, the even-3-partty function is referred to as threeargurnent Boolean rule 1"05. Similarly, the even-4-parity function is fourargument Boolean rule 38,505; the even-5-parity function is five-argument
ruhel,77L,476,585;and the 6-symmetry function (subsection 5.2.1) is six-argument Boolean rule 9,225,659,A3A,7 04492,545.
6.2 PREPARATORY STEPS WITHOUT ADFs
To establish a baseline for measuring the effectiveness of automatically
defined functions, we first measure the performance of genetic programming
158 Chapter 6
in solving the even-3-, 4-, 5-, and 6-parity problems without automatically
defined functions.
hr applyrng genetic programming to the even-parity problem of k argaments, the terminal set, t, consists of the ft Boolean arguments D 0 , DL , D2 ,
. . .
involved in the problem, so that
f- {D0, D1, D2, ...}.
The function set, f, consists of the following computationally complete set
of four primitive Boolean functions:
f- {AND, OR, NAND, NOR}
with an argument map for the function set, f, of
12,2,2,2\.
The Booleeil:r even-parlty functions appear to be the most difficult Boolean
functions to find with a blind random search of a space of prograrns composed of functions from the function set, F, arrd terminals from the terminal
set, { Even though there are only 256 different Boolean functions with three
arguments and one output the Boolean even-3-parlty function is so difficult
to find by a blind random search of program space (over fand fl that we did
not encounter a random program with this behavior after 10,000,000 trials
(Gutetic Programming,table 9.3). In addition, the even-3-parity function appears
to be the most difficult to leam using genetic programming with the function
set, f, and terminal set, 'T (Grnetic Programming, table 9.4). The even-parity
problem is much more difficult to leam than the symmetry problem of the
same order.
The set of possible fibress cases for this problem consists of the 2ft combinations of the /c Boolean arguments. Since this number of fibress cases is finite
and relatively small (for small values of k),we use all 2k cornbinations as the
fitress cases for leaming this function.
The raw fitness of a program is the number of fibness cases (out of 2k) for
which the program refurns the correct value. Raw fihress remges between 0
and 2k ,and a larger value is better.
The standardized fitness of a program is the sum, over the 2k fitness cases,
of the Hamming distance (error) between the value returned by the program
and the correct value of the Boolean function. Standardized fitness ranges
between 0 and 2k , ar;tda value closer to 0 is better. Raw fibress is 2k minus
standardized fihress.
Our goal here is to make performance curves based on a series of runs,
with a constant populationsize, of the even 3-,4-,5-, and 6-parity problems,
with and without automatically defined functions. Of these eight versions,
theeven-6-parityproblemwithoutautomaticallydefinedfunctionswillprove
to be the most difficult to solve. The largest population that our existing computing equipment can handle is 16,000. In fact, a population of size 16,000 can
be handled only if the individual programs in the population are represented
as arrays (described in appendix D), rather than as LISP S-expressions. As
159 Boolean Parity Functions
Thble 6.2 Tableau withoutADFs for the even-3-parity problem.
Objective: Find a program that produces the value of the Boolean
even-3-parity function as its output when given the
values of the three independent Boolean variables as
its input.
Terminal set
without ADFs:
D0,DL, and o2.
Function set
without ADFs:
AND, OR, NAND, and llOR.
Fibress cases: All 23 = 8 combinations of the three Boolean arguments
D0, Dl-, and oZ.
Raw fitress: The number of fibress cases for which the value
retumed by the program equals the correct value of the
even-3-parity function.
Standardized fihress: The standardizedfitness of a program is the sum, over
the 23 = 8 fihress cases, of the Hamming distance (error)
between the value returned by the program and the
correct value of the Boolean even-3-parity function.
Hits: Same as raw fihress.
Wrapper: None.
Parameters: M='16,000.G=5'J..
Success predicate: A program scores the maximum number of hits.
will shortly be seeru a population size, M, of 16,000 is barely satisfactory. It is
sufficiently large to yield solutions without automatically defined functions
on 100% of the runs of the even-3- and 4-parity problems and onM"/o of the
runs of the even-S-parity problem.
The computational effort, E, required to yield a solution to this problem, of course, depends on the choice of the major parameter of population size, M, aswell as the choices for the minor quantitative and qualitative
control parameters of the run. We have no reason to believe that the population size, M, of 16,000 and the maximum number of generations to be
rur:., G, of 5L, is optimal for any particular size of parity problem. In any
event, no one pair of choices for M and G can be optimal over a range of
problem sizes. Constant values of. M, G, and the minor parameters are
used here to minimize the variation in parameters while we do our comparative analysis of performance.
Table 6.2 summarizes the key features of the problem of symbolic regression for the even-3-parity function without automatically defined ftrnctions.
This table canbe applied to the 4-,5-,and 6-parityproblemsmerelybyappropriately enlarging the terminal sef expanding the set of fitness cases; and
increasing the numerical range of values for raw fifiress, standardized fihress,
and hits.
160 Chapter 6
l- pu$ I
l+ I(M, i, z) |
I M = r6pool
I z=99%o I
ln(')=t I
I N=34 |
(5,1007o)
E
o(t)
0
q)
I
h
A ,
E
q)
A
t)
a
t
6g
t
-
FI
.-
.-
-l/
I
FI
I
u)
CA
0)
I
I
-
)
a
CH
>r
*a
. l
A
. l
-
ae
A
L
A -
-
Without Defined Functions
1,200,000
600.000
(2,l8%o) Generation
Figure 6.2 Performance curves for the even-3-parity problem showing that E*rr7ou, = 96,000
withoutADFs.
6.3 EVEN-3-PARITY WITHOUT ADFs
Genetic programming is capable of solving the even-3-parity problem without automatically defined functions.
hr one run (out of Mruns) of the even-3-parity problem without automatically defined functions, genetic programming discovered the following
3S-point program with a perfect value of raw fihress of 8 (out of a possible
value of 23 = 8) in generation 4:
(NAND (OR (AND D1.D2) (NAUO (AND (NAND DO DO) (UON D]- D2)) (NAND
(NAND D1 D1) (aNn D2 D0)))) (OR (OR (AND (AND D0 D21 nl) (NOR
(AND D2 D0) D1) ) (NOR D2 D0) ) ) .
The average structural complexitf, 1withou1, of the 100"/o-cotrect solutions
from the 34 successful runs (out of 34 runs) of the even-3-parity problem without automatically defined functions is M.6 points.
Figure 6.2 presents the performance curves based on the 34 runs for the
even-3-parify problem without automatically defined functions. The cumulative probability of success, P(M ,i ), is L00% by generation 5. Note that since
P(M,l) is computed from empirical data, this probability of 100% is not a
guarantee of a solution for this problem by generation 5. If a much larger
number of runs were made, some runs would not produce a solution by generation 5. The two numbers in the oval indicate that if this problem is run
through to generation 5, processing a total of Erurout = 96,000 individuals
(i.e., L6,000 x 6 generations x 1 run) is sufficient to yield a solution to this
problem with 99%probabiJtty. Only one run is required to solve this problem
with 99"/.probability with this large population size because P(M,i) exceeds
99"/' at generation 5. P(M,i) is only 9I"h at generation 4, thus making
R(M, i ,z) = 2. The number of individuals to be processed is therefore 160,000
161. Boolean Parity Functions
L62
(i.e., L5,000 x 5 generations x 2 runs) for generation 4. This is higher than for
generation 5 (where R(M, i ,7) isl). For generations 6 through s0, I(M,i,z) is
necessarily higher than it is for generation 5 because M md R(M, i ,z) are constant and i is greater than 5. Consequently, I(M,i,z) forgenerations 5 through
50 (and beyond) ramps up as the rising edge of a sawtooth. The fact that
P(M ,r) is 0% for generation 0 indicates that a blind random search of 544,000
programs (i.e.,34 x L6,000) in the space of possible programs (over f and t)
did not yield even one solution to this problem.
6.4 EVEN'4-PARITY WITHOUT ADFs
Genetic programming discovered the following t}T-pont program with a
perfect value of raw fihress of t6 (out of t6) in generation L3 of one run of the
even4panty problem without automatically defined functions:
(NAND (NOR (NAND (NOR (AND (NOR D3 D1) (AND D2 DO)) (NAND (OR
(NAND (NOR D2 D0) (AND Dl D3)) (On 141qD D1 D0) (NOR D2 (OR (OR
D3 D2) (ON D3 D3))))) (NAND (AND (OR D2 DL) (AND DO D1)) (ON
(AND D3 D2) (NOR D2 D3))))) (OR (AND (OR DO D2) (NAND (AND DO
D3) (AND D2 Dl))) (NAND (NOR D2 D]-) (NOR D3 DO)))) (NON (NAND
(AND (NOR (AND D]- D0) (NOR D2 D3)) (On 14NiD D2 D0) (On D3 D1-))) (NOR
(AND D3 D2) (NOR D0 D1))) (AND (NOR (AND D3 D3) (NAND (NOR
D2 (AND (NOR D1 D0) (NAND D3 D2))) (AND D1 D3))) (NOR (OR D0 D3)
(AND D3 DO))))) (OR (OR D]- DO) (ANN (NAND D3 D2) (NAND D]-
D2)))) .
The average strucfural complexitf, S*itrrou1, of the 100%-correct solutions
from the L8 successful runs (out of L8 runs) to the even4-parityproblem without automatically defined functions is L12.6points.
Figure 6.3 presents the performance curves for the even4-parity problem
without automatically defined functions over the series of L8 runs. The
cumulative probability of success, P ( M,i ), is 100% by generation 23. The numbers in the oval indicate that, if this problem is run through to generation 23,
processing a total of E*itnout = 384,000 individuals (i.e., 16,000 x 24 generations
x t run) is sufficient to yield a solution to this problem with 99% probability.
5.5 EVEN-FPARITY WITHOUT ADFs
The followng 321-point program with a perfect value of raw fihress of 32
(out of 32) was discovered in generation 38 in one run of the even-S-parity
problem without automatically defined functions:
(NAND (NAND (NOR (NOR (NOR (NAND D1" D2 ) (EM Dl D3 ) ) (NOR (NOR (NOR
(NAND DLD2) (ANO D1 D3)) (AND (NAND (OR (OR D3 D1) (NOR D1
D4)) (NOR (OR D0 D0) (wOn D2 D1))) (OR (AND (OR D4 D1) (AND D2
OO)) (NAND (NA\TD D2D2) (NOR D1 (NOR D2 DO)))))) (NAND (NAND
(AND D2 (NAND (AND (AND (NAND D3 (NAND D2 DO)) (OR (OR D1 D1)
(AND D3 D2))) DO) DO)) (AND D]. D4)) (NAND (AND D2 DO) (NOR (NAND D4
DO) N:))))) (NAND (NAND (AND D2 (NAND (AND (AND (NAND D3 D2)
Chapter 6
,.-. 1
h-Q
a
(Aq)
I
I
rl
V)
CH
I
.Fl
-t
a -
a\ -
CB
A
f
L
A
H
Without Defined Functions
0 0r,5.5vo) 25 50
Generation
Figure 5.3 Performance curves for the even4-parity problem showing that E*ir1ou, =384,0N
withoutADFs.
(OR (OR D1 DI) (AUO D3 D2))) NO) DO)) (AND D1 D4)) (NAI\TD (A}TD D2 DO)
(NOR (NOR (OR D0 D2) (AND (NOR D4 (AND (xzuUO D4 D3) (On D0 D2)))
(NAND (NAND DO D4) (ON DO D1)))) (AND D2 (NAND (AIVD (AND
(NOR (NOR (AND D1 D3) (NOR D3 D4)) (AND D]. D3)) (OR (OR D]_ D1_)
(AND D3 D2))) DO) DO)))))) (OR (NAND (NAND D3 D4) (NAND D]- D2))
(NOR D3 D4))) (NAND (NAND (NAND (NOR D4 D3) (OR D0 D4)) (NOR (NOR
(oR (NAND (NAND (NAND D1 (NOR D3 D4)) (NOR D2 D1)) (OR (OR
D3 D4) (wOn (NOR (OR D0 D0) (NOR D2 D1)) or111 (OR (AND D4 D3)
(NORD3 D4))) (AND (OR (NAND (NOR (NORD4 DO) (ON (NOR (NA\]DD1
D2) (NOR D3 D4 ) ) Dl) ) (AND (AND D2 (NAATD (AND D2 DO ) (NAND D3
D3))) (AND Dl D1_))) (OR D2 D2)) (AND (OR D0 D2) (On D1 D0))))
(NAND (NAND (AMO 1X1A\ID D3 D4 ) (UON (AND 15gP DO D3 ) (NAND D2
D0)) (OR (NOR D3 D4) (On D2 D1)))) (NOR (NOR D4 D0) (NOR (OR
(NOR D2 D1) D0) (aNl (NOR D4 (/\lio (Nano D4 D0) (NoR D1 D4)))
(NAND (NAND DO D4) (OR DO D1-)))))) (NAMN (AI\TD D1 D3) (AND (AND
D2 (OR D3 DO)) (AND D2 (AND D1 D4))))))) (NOR (AI{D DO D2)
(NOR D3 D4) ) )) .
Notice the unwieldy_size of this 32L-point solution. Indeed, the average
sbrrctural complexity, Swithout,of the 10O%-correct solutions from the lL successful runs (out of 25 runs) of the even-S-parity problem without automatically defined functions is2999 points.
Figure 6.4 presents the performau:rce curves for the even-S-parity problem
without automatically defined functions over the series of 25 runs. The cumulative probability of success, p(M,i)
,
is "/oby generation 34 artd M%by
generation 50. The numbers in the oval indicate that, if this problem is run
through to generation 50, processing a total of E.itr,ort = 6,52g,000 individuals
(i.e., L6,000 x 51 generations x 8 runs) is sufficient to yield a solution to this
problem with 99"/o probability.
5,000,000 g
\s\ 9
(S},lCf.Vo) g
k
A
E
q)
A2,500,000 s
a
-
ct
FIE. I
. l€f,
I
-
0
l- prr'l'D I
l+ I(M' i' z)l
lM=l6'om
l'"Af:i I
lN=18 |
163 Boolean Parity Functions
-
c)
a
a
€)
9
L
A .
-
q)
-
I
.t)
-
-
-
-
F
.ts
.-
Ed
. q
-t
f- p,M'rll
l+ I(M' i' z) |
[M = r6pool
I z=99vo I
ln(r)=g I
lN=25 I
^ 1
- s
a
a
q)
I
9
-
)
0
CH
*J .-
d
.-
A
-.
cg
A
-.
L
A ,
-
Without Defined Functions
80,000,000
40,000,000
S
(50, aAVo)
Generation
Figure 5.4 Performance curves for the even-S-parity problem showing tha t Erithort = 5,528,000
withoutADFs.
Recall that the computational effort, Ewithout, was 96,000 for the 3-parity
problem and was 384,000 for the A-panty problem. Now it is 6,528,000 for the
S-parity problem. In other words, as the order of this problem increases, the
computational effort without automatically defined functions grows rapidly
and nonlinearly.
6.6 EVEN-5-PARITY WITHOUT ADFs
We are unable to continue the progressive comparison of the computational
effort, Ewithoutr necessary to solve the even-parity problem with increasing
numbers of arguments without automatically defined functions with our
chosen population size, M, of 16,000 and our chosen maximum number of
generations to be run, G, of 5L.
We made L9 runs of the even-5-parity problem without automatically
defined functions with a population size of 15,000 and with each run
being abandoned, as usual, after generation 50. Every run made Progress
toward solving the problem; however, none found a 1O0%-correct solution.
Figure 5.5 shows, by generatiory the progress made by the 19 unsuccessful
runs of the even-6-parity problem without automatically defined functions.
The curve on the bottom is the standardized fitness, by generatiory of the best
of the 19 best-of-generation programs. For example, the best of the L9 best-ofgeneration programs for generation 0 has a standardized fitness of 29 (i.e.'it
scores 35 hits out of 64); the best of the 19 best-of-generation programs for
generation 50 has a standarduedfitness of 6 (i.e., it scores 58 hits out of 64).
The curve in the middle is the average, by generation, of the standardized
fitness of the L9 best-of-generation programs. It reaches 9.1 (54.9 hits) by
25 50
1.64 Chapter 6
48
a
q)
o
t
{.t ?t
la J p
il
q)
N.-
li
€16 .'t
!
a
Average of average fitness
-.F Average of best of runs
-* Best of all runs
02550
Generation
Figure 6.5 Three measures showing the progressive improvement in fitness of the 19 unsuccessful runs of the even-6-parity problem withoutADFs.
generation 50. The curve on the top is the average,by generation, of the 19
values of the average standardized fibress of the population of L6,000 as a
whole. It reaches 13.1 by generation 50. All three curves indicate that genetic
prograrnming is making progress in solving this problem.
Cenetic programming is presumably capable of solving the even-6-parity problem without automatically defined functions. This problem is
surely solvable with a larger population size; howevel, 16,000 is the largest population our existing computing equipment can handle. This problem is probably also solvable by continuing the runs for additional
generations.
Although we did not get one solution, much less the multiple solutions
necessary for construction of a meaningful performance curve, we can make
a rough estimate of the computational effort, Ewithout, for this problem. Srppose that the number of hits leaps to 64 on generation 50 of the 19th run so
that the final run of this series becomes successful. The probability of success,
P(1,6,000, 50, 0.99), then becomes 0.053, instead of 0.0. If we then compute
R(e) and Ewithout in the usual waybased on this hypothetical success and this
admittedly inadequate nurnber of successes, we find that R(z) would be 86
and that Ewithout would be70,175,000 (i.e., 16,000 x 51 generations x 86 runs).
This rough estimate probably understates the frue value of Ewitho,,,. Based on
this rough estimate, the progression of values of Eri,hou, for the even-3-,4-,
5-, and 6-parity problems without automatically defined functions then
becomes 96,000,384,000, 6,528,000, and 70,176,000. That is, there is an explosive growth in Ewithout as the problem is scaled up along the dimension of the
number of arguments.
Even though none of 19 runs actually produced a solution to the problem, the average structural complexity, S.itnout, of the solutions can also
be roughly estimated.
1,65 Boolean Paritv Functions
Figure 5.6 shows, by generation, the average of the structural complexity
of the 19 best-of-generation programs (called rhe aauage of the besf) and the
average, over the 19 runs, of the average value of the structural complexity of
the 16,000 programs in the population as a whole (called the aaerage of
the aauage).
On generation 50, the L9 best-of-generation programs score between 52
and 58 hits. The average of the structural complexity values for these 1"9 programs is 328.0. Based on the upward trend of this curve, the true value of the
average structural complef V, Swithoul, of. a set of acfual lO0%-correct solutions is likely to be somewhat above 328.0. Thus, we adopt 328.0 as a rough
(probably understated) estimate for Swithout for solutions to the even-6-panty
problem without automatically defined functions.
6.7 MULTIPLEFUNCTION-DEFINING BRANCHES
In all the problems in chapters 4 and 5, there was only one function-defining
brandr in the overall computer program when automatically defined functions were being used. The parity problem will illustrate the more general
situation where there are multiple function-defining branches.
A human programmer writing code for the even-3-parity or even4-parity functions would probably choose to call upon either the oddZ-parity function (also known as the exclusiae-or function XOR, the inequality function, and two-argument Boolean rule 6) or the even-2-parity function (also known as the equivalence function EQV, and two-argument
Boolean rule 9).
For example, given the function set available here, the human prograrnmer
writing code for the even-3-parity of D0, Dl, and oz might write something
like the following:
4
lz
3
4
5
6
U
9
; ;;- def init.ion of the two-argument exclusive-or funct.ion
i i i ODD-2-PARITY (XOR)-
(progn (defun ODD-2-PARITY (arg0 argl)
(values (NOR (AND argO argl-)
(NOR arg0 argl))))
i i;- main program for the even-3-parity of D0, DL, and D2-
(values (ODD-2-PARITY D0
(nand {ODD-2-PARTTY D1 D2)
(oDD-2-PARrrY D1 D2) ) ) )
Lines 3 through 5 constitute the function definition for the two-argument
ODD_2 -PARTTY.
Lines 7 through 9, the main program, calls the two-argument oDD-2 -
PARTTY function three times in order to compute the even-3-parity of o0,
D1, and D2.
A human prograrruner writing code for the even-S-parity function and
parity functions of higher orderwouldprobably alsowant to callupon either
the even-Fparity (three-argument Boolean rule 105) or the odd-&parity (threeargument Boolean rule 150) as building blocks. Parity functions of order three
Chapter 5
f
. l
Xq)
-
ea
IU
-
6E
k
-
.u
I
-
-
L€
0
- Average structural complexity of best-of-generation programs
* Averaqe of structural comDlexi
02550
Generation
Figure 5.5 TWo structural complexity measures of the L9 unsuccessful runs of the even-6
parity problem without ADFs.
Figure 5.7 Overall structure of a program with two function-defining branches and one
result-producing branch.
greatly facilitate writing code for the higher-order parity functions. Neither
of these functions is, of course, in the original se| f, of available primitive
functions. The fact that the progranuner might want to use both a}-paity
function and a 3-parity function suggests that more than one function-defining brandr might be desirable for higher-order parity problems. Multiple function-defining branches ceu:r be implemented merely by adding additional
defuns to the progn of the overall program.
Figure 6.7 shows an abstraction of the overall structure of a program wittr
two function-definingbrandres (each taking two dummyvariables ARGO and
ARGI-) and the one result-producing branch.
5.8 HIERARCHICALAUTOMATICALLYDEFINEDFUNCTIONS
It is common in ordinary programming to define one function in terms of
other already-defined functions. For example, a human programmer needing the sine and cosine functions in several places in a main program would
1.67 Boolean Parity Functions
(ARGO ARGI)
write subroutines for them and then repeatedly call the subroutines from the
main program. Then, if the tangent function were needed, the progranuner
might write a subroutine for the tangent in terms of the already-available sine
and cosine functions. Defining one subroutine in terms of othel alreadydefined subroutines creates a hierarchy of function definitions. Such hierar'
chies leverage the value of previously written code. The sine function canbe
called directly when it is needed and it will be invoked indirectly whenever
the tangent is needed.
Orce there is more than one function-defining branch, the question arises
as to the nature of the relationship among the function-defining branches.
There are several possibilities.
First, there might be no references among the function-defining branches.
That is, ADFO, ADF1, etc. might appear in the function set of the resultproducing branch, but never in the function set of any automatically defined
function. In the context of a graph in which the points rePresent automatically defined functions and a directed line represents a direct reference by
one automatically defined functionto another, this possibility corresponds to
a graph consisting of isolated points.
Second, there might be no restriction on the nature of the references among
the function-defining branches. This possibility permits an automatically
defined function to refer recursively to itself (directly or indirectly). \A/hen an
automatically defined function refers to itsell its name can appear in its owrl
function set or in the function set of a another automatically defined function
to which the first automatically defined function refers (directly or indirectly).
This possibility corresponds to a directed cyclic graph of references among
the function-defining branches.
Third, there might be a hierarchy of references among the function-defining branches in which the name of any automatically defined function that
has already been defined (i.e., has already been evaluated sequentialty by the
progn) may appear in the function set of any subsequent function definition.
For example, if there are three function-defining branches, ADFO can appear
in the function set of ADFI- and anr2, but not vice versa. ADF1 can appear in
the function set of Anp2, but not vice versa. This possibility corresponds to a
directed acyclic graph of references among the function-defining branches.
The second and third possibilities are called hierarchical automatically
defined functions.
The first approach (used in chapters L5 to 20 herein) is especially appropriate when the function-defining branches serve as pattem detectors that fire
when certainunrelated conditions are satisfied. The second approachinvolving recursion will not be used at all in this book. The third approach will be
used most frequently herein.
Figure 6.8 shows an abstraction of an overall program with hierarchical
automatically defined functions. The first function-defining branch, ADFO,
consists of invariant points of type 2,3,4, and 5 above the dotted line and a
body of type 7. The second function-defining branch, ADFI-, also consists of
invariant points of type 2, 3, 4, and 5 above the dotted line and a body
168 Chapter 6
Function Definition
Figure 6.8 Program with hierarchical ADFs.
composed of points of a new type 9. Points of type 9 may contain references
to the already-defined function ADFO. Lr contrast, points of type 7 do not
contain references to ADFI-. The result-producing branch consists of one invariant point of type 6 and a body consisting of points of type 8. The resultproducing branch can refer to both ADFO and anr'l.
The idea of hierarchical automatically defined functions canbe illustrated
with the following overall program for solving the even-S-parity problem:
1- i;;- definition of ADFO for even-2-parity function-
) lnram /Aa€rrn ATIE'O lzra1 =.fg1)
- \-L--v:Jrr \svr v \sr:Jv q
3 (values (OR (AND argO argl)
4 (NOR ars0 argl ) ) ) )
5 ;;;- definition of ADF1 for odd-3-parity funct.ion6 (defun ADF1 (argO argl- arg2)
7 (values (adf0 arg0 (adf0 argl- arg2))))
8 ; ;;-main program for even-5-parity of D0 , D7, D2, D3, D4-
9 (values (adfO (adf0 (nand d3 d3) d4)
1_0 (adfl- d0 d1 d2) )))
Lines 24 defne the even-2-parity function ADF O of two dummy variables,
ARGO and anCt.
Lines 6-7 define the odd-3-parity function ADF 1 of three dummy variables,
ARGO/ ARG1, and anc2. ADF1 is a hierarchical automatically defined function
because it references the already-defined ADFO .
Lines 9-10 are the result-producing main program. The resultproducing
branch calls both ADFO and aorl- in solving the overall problem.
The above 10-line program for the even-S-parity problem illustrates four
of the five ways itemized in chapter 3 in which the hierarchical problemsolving approach can be beneficial: hierarchical decomposition, recursive
application of hierarchical decomposition, parameterized reuse, and
abstraction.
First, the hierarchical decomposition is illustrated by the fact that the overallprogram for solving theproblemconsists of the two automatically defined
functions, ADFO, ADF1, as well as a result-producing branch.
t69 Boolean Parity Functions
170
Second, the two times that the result-producingbranch invokes ADF0 and
the two times that ADFI- invokes ADFO illustrate parametertzed reuse of the
solution to a subproblem. ADFO is a general way of computing the evenZ-panty function. Generalization comes from such parameteraed reuse.
Third, the recursive application of the entire three-step hierarchical decomposition process is illustrated by the fact that ADF1 invokes ADFO. The solution to the subproblem represented by alrt (i.e., the odd-3-parity function)
is solved by the s€une three-step hierarchical problem-solving process as is
used to solve the overall problem. The subproblem represented by ADF1 is
solved by decomposing the odd-3-parity subproblem into the sub-subproblem represented by anrO (i.e., the even-2-parity function). Then the sub-subproblem ADFO is solved. Finally, the solution to the odd-3-parity subproblem
is obtained by assembling solutions to the even-2-parity sub-subproblem. Note
that the term "recursive application" from chapter 3 does not involve reclusion (in the sense of an ADF calling itself).
Fourth, each time ADFO or ADF1 are invoked with a particular combination
of their two or three arguments, abstraction is occurring. All the other variables of this problem are momentarily irrelevant.
6.9 PREPARATORY STEPS WITH ADFs
The explosive growth in the number of fibress evaluations and the average
structural complexity for solving progressively more difficult parity problems canbe controlled if we exploit the underlyr g regularities and syrnmetries of these problems and hierarchically decompose the problem into more
tractable subproblems. This canbe accomplishedby discovering one or more
reusable functions parametenzedby dummy variables.
In applying genetic programming with automatically defined functions to
the even-3-parity problem, we first decide on the architecture for the overall
Program.
The Boolean parity function retums only a single Boolean value. This single
value will be the value retumed by the resultproducing branch of the yet-tobe-evolved overall program.
We now consider the number of arguments to be possessed by the automatically defined functions.
There is usually no advantage to enabling automatically defined functions
to take more arguments than there are acfual variables of the problem. This
suggests an upperbound on the number of arguments for the automatically
defined functions that is equal to the arity of the problem (i.e., 3 for the
3-parity problem).
Azero-argument automatically defined function merely provides away
to evolve a constant (in a problem where there are no side effects on the sys'
tem and no global variables). In the case of the Boolean domain, there are
only two Boolean constants, T and NIL. These two constants are not particulutly useful and, in any event, can, if needed, be easily evolved without
recourse to an automaticallv defined function.
Chapter 6
There are only four rather uninteresting one-argument Boolean functions. TWo are constant functions; one is the identity function; and the
fourth is the negation function. Since we already have both NAND and NoR
in the function set, the one-argument negation function does not seem to
be particularly useful.
Thus, two appears to be the practical lower bound on the number of arguments for interesting automatically defined functions in the Boolean domain.
Three competing considerations affect our choice between two and three
as the number of arguments for the automatically defined functions for the
even-3-parity problem.
First, as a general principle, we want to impose as few a prioriconstraints as
practical in order to give the evolutionary process the opportunity to define
whatever functional subunits that it might find useful. The availability of a
dummy variable to an automatically defined function does not create any
requirementthatthe functionactuallyrefer to that argument or actuallyuse it
in any meaningful way. Thus, this first consideration suggests a choice of
three, rather than two (for this problem of arity 3).
Second, being always mindful of the practical consideration of computer
time and our need to achieve multiple successful runs in order to produce
performance curves, we must consider the fact that each additional available
dummy variable in a function-defining branch slows genetic programming.
Our overarching pu{pose is, of course, to evolve the contents of the functional subunits and the result-producing branch. The architectural choices
mere$ establish a loose framework in which the actual work-performing code
can evolve. Although we generally favor as few constraints of any kind as
possible on the evolutionary process, this second consideration suggests a
choice of two, rather than three, arguments.
Third, since we intend to do a comparative analysis of the even-3-, 4-,5-,
and 6-parity problems, we prefer to have a formula that applies to all four
situations, rather than a series of unrelated decisions. If the even-3-parity problem were the only problem under consideration, the first consideration (flexibility) would clearly outweigh the second (practiculity) because the amount
of computer time involved in the even-3-parity problem is insignificant. Programs tend to be larger as the number of arguments increase. Moreoveq, the
number of fitress cases is 2k, where k is the afity of the problem. Thus, sixargument automatically defined functions are very time-consuming in the
context of the 6-pafity problem. Thus, we decided against a formula calling
for automatically defined functions of arity k for the even-k-parity problem
since that formula would mandate using six-argument automatically defined
functions for the even-6-panty problem. Given the altematives and constraints,
the exclusion of that formula resulted in our adoption of a formula calling for
automatically defined functions of arity k-L for the even3-,4-,5-, and 6-parity problems.
Because we are interested in evolving hierarchical automatically defined
functions, we decided that each program in the population should have more
than one function-defining branch. Since we envisaged (incorrectly, as it tumed
L71, Boolean Parity Functions
out) that a solution of the even-S-parify problem would usually involve lowerorder parlty functions, two function-defining branches appeared to be sufficient for the even-6-parity problem (and presumably also for the even-3-,4-,
and S-parity problems). Of course, three function-defining branches might
also have been a good choice. The availability of an additional function-definingbranch does not create any requirement that it actuallybe used. However, each additional function-defining branch slows genetic programming.
Therefore, we decided that each individual overall program in the population will consist of two function-defining branches. Since we are interested in
evolving hierarchical solutions, we also decided that defined function ADF1
can refer hierarchically to ADFO.
The first major step inpreparing tb use genetic prograrnming is to identify
the set of terminals and the second major step is to identify the function set.
lAtrhen automatically defined functions are involved, these two steps mustbe
performed separately for each branch of the overall problem. For this problem, each of the three branches is composed of different ingredients.
We first consider the first function-defining branch for the even-3-parity
problem.
The function set, fad.fy, for anr'O consists only of the set, F, of primitive
functions for the parity problem, namely
faffo - {AND, oR, NAND, NoR}
with an argument map of
{2,2,2,2}.
The terminal set, tad.f7, for ADF0 consists of two dummy variables and is
tadf7 - {ARGO, ARG1 }.
The function-definingbranchanro is a composition of primitive functions
from the function set, fadf7, and terminals from the terminal set, ,Tadf7.
We next consider the second function-defining branch.
The function set, fadf I, for ADF 1 consists of the union of the set, f, of primitive functions for the parity problem and the now-defined function ADFO
thereby enabling the function-defining branch for ADF1 to refer hierarchically to the now-defined function ADF0. That is,
fadfl - {ADF0, AND, oR, NAND, NoR}
with an argument map of
{2,2,2,2,21.
The terminal set, tad.fl, for ADF1 consists of two dummy variables.
tadfl - {ARGO, ARG1 }.
Note that although we use the same names, ARGO and aRcl, for the dummy
variables of both ADFO and anp1, these dummy variables are only defined
locally within a particular automatically defined function.
Chapter 6
173
The function-defining branch ADFl is a composition of functions from the
function set, fogy1, and terminals from the terminal set tadfi.
Note that the actual variables of the problem (i.e., oO, D1, and n2) do not
appear in either function-defining branch.
We now consider the result-producing branch.
The function set, frpb, of the result-producing branch contains the four
primitive Boolean functions from f and the two automatically defined
functions ADF0 and eoPl.
frpb = { aon o , ADF l- , AND, oR, NAND, NoR }
with an argument map of
{2,2,2,2,2,2}.
The terminal set, Trpb, for the result-producing branch consists of the three
actual variables of the even-3-parity problem/ so
Trpb= {D0, Dl, D2 }.
Note that the result-producing branch does not contain any dummy variables, such as ARGO or ARGI.
The result-producing branch is a composition of functions from the function set, frpb, and terminals from the terminalset, Typb.
When the overall program is evaluated, the progn evaluates eachbranch
in sequence. The function definition for ADF0 is evaluated first. Then, the
function definition for ADFl is evaluated. ADFl may contain a reference
to ADFO, which, by this time, has already been defined. Finally, the resultproducing branch is evaluated. This branch may contain references to both
ADF0 and Rnp1, which have both, by this time, been defined. The value
returned by the overall program consists only of the value returned by the
last argument of the progn (i.e., the result of evaluating the result-producing branch).
One might include the actual variables of the problem (i.e., the terminals from the terminal set, q) inthe terminal sets of the function-defining
branches. Although there are specific problems for which it may be desirable to give the function-defining branches direct access to some or all of
the actual variables, we generally view such inclusion as inconsistent with
the goal of encouraging generality in the function-defining branches.
Accordingly, our convention in this book is not to include the actual variables of the problem in the function-defining branches.
Thble 6.3 summarizes the key features of the problem of symbolic regression of the even-3-parity function with automatically defined functions.
Lr what follows, genetic prograruning will be allowed to evolve a function
definition in each of the two function-defining branches of each program and
thery at its discretiory to call one, two, or neither of these automatically defined
functions. We do not specify what program tree will be evolved in the function-defining branches. We do not specify whether the defined functions will
actually be used (it being possible, as we have already seery to solve this
Boolean Parity Functions
Thble 6.3 Tableau withADFs for the even-3-parity problem.
Objective: Find a program that produces the value of the Boolean
even-3-parity function as its ouput when given the
values of the three independent variables as its input.
Architecture of the
overall program
with ADFs:
One result-producing branch and two two-argument
function-defining branches, with aor t hierarchically
referring to ADF0.
Parameters: Branch Vping.
Terminal set for the
result-producing
branch:
D0, Dl, and oZ.
Function set for the
result-producing
branch:
ADFO, ADF1, AND, OR, NAND, and NOR.
Terminal set for the
function-defining
branch ADF0:
ARGO and aRCl
Function set for the
function-defining
branch ADF0:
AND, OR, NAND, and mOR.
Terminal set for the
function-defining
branch ADF1:
ARGO ANd ERG1.
Function set for the
function-defining
branch ADFl:
AND, OR, NAND, NOR, and ADF0 (hierarchical
reference to ADF0 by anrt ).
problem without automatically defined functions by evolving the entire program in the result-producing branch). We do not require that a function-defining branch refer to or use all of its available dummy variables. We do not
require that the second automatically defined function actually refer to the
first automatically defined function. We do not require that either automatically defined function be useful; an automatically defined function may, for
example, duplicate a primitive function that is already available as a primitive function in the function set of the result-producing branch. We do not
require that the automatically defined functions be different from one another. Lrstead, the structure of all three branches is determinedby the combined effect, over many generations, of the selective pressure exerted by the
fihress measure and by the effects of the operations of Darwinian reproduction and crossover.
An enormous amount of computer time can be saved on Boolean problems with various optimization techniques. One technique involves identifying the particular Boolean function performed by the bodies of anp0 and
ADF1 (using the numbering scheme previously described for Boolean functions); creating a lookup tablb for automatically defined functions; and then
174 Chapter 6
(ARGOARGI) (ARGO ARGI)
Figure 6.9 L}}%-correct best-of-run program from generation 2 of a run of the even-3-parity
problem with ADFs.
using the lookup tables in lieu of evaluating the entire body of the ADF for
each fitress case. This technique is used on the even-6-parity problem.
A second optimization technique is to identify the Boolean function using
the numbering scheme described in section 6.1, convert the bodies of aorO
and anr'1 into disjunctive normal form (DNF), and then compile and cache
each different function. The nurnber of different cached functions grows with
problem size and there is a corresponding dropoff in the number of references to each such cached function. Consequently, this technique is used only
on the 3-,4-, and S-parity problems.
These optirnizations accelerate runs of the parity problem in this chapter
by between one and two orders of magnitude.
6.L0 EVEN-3-PARITY WITH ADFs
hr one run (out of 33 runs) of the even-3-parity problem with automatically
defined functions, genetic programming discovered the following 2t-point
program in generation 2 with a perfect value of raw fitness of 8:
(progn (defun ADFO (ARGO ARG1)
(values (NOR (AND ARG1 ARG0) (xon ARGO ARGI))))
(defun ADF1 (ARGO ARG1)
(values (OR (NoR ARGI ARGO) (NOn ARGI ARGI) )))
(values (ADF0 (ADF0 D1 D0 ) (nOn D2 D2 ) ) ) ) .
Figure 6.9 shows this 1OO%-correct best-of-run individual with automatically defined functions from generation 2 as a rooted, point-labeled tree with
ordered branches. The function-defining branches are on the left and in the
middle of this figure and the result-producing branch is on the right.
The first branch of this best-of-run program is a function definition for
the two-argument ADF 0, which, when simplified, is equivalent to the oDD2 *PARTTY function (xOn).
175 Boolean Parity Functions
The second branch defines the two-argument ADFl. Although ADFl
potentially may refer hierarchically to ADFO, this particular ADF1 does not
refer to aDFO. Howeveq, ADFl's lack of references to ADFO hardly matters
since ADF1 is not called by the result-producing branch.
The result-producing branch of this best-of-run individual contains two
references to ADFO in nested form. Upon substitution of oDD-2 -pARrry for
.r 1
AUt,U, rt oecomes
(oDD-2-pARrry (oDD-2-pARrTy Dl- D0) (NOR D2 D2)) .
When simplified, this is equivalent to
(oDD-2-pARrTy (ODD-2-pARrTy D1 D0) (NOr D2)),
which is a correct solution to the even-3-parity problem.
This solution evolved by genetic programming can be interpreted as a hierarchical decomposition of the problem. Genetic progranuning discovered a
decomposition of the overall problem involving the oDD - 2 - PAR rTY subproblem. Then, genetic progranuning solved the subproblemby evolving a100%-
correct Boolean expression for the oDD-2-pARrry function in the body of
ADF 0. Third, genetic prograrnming assembled the results of solving the ODD2 - PARTTv subproblems into a solution of the overall even-3-parity problem
by invoking ADF0 twice in a nested way in the result-producing branch.
Note that we did not specify in advance that ADFO would be used to define
the oDD - 2 - PAR rTY functiory as oppose d to, say,the if-then function" the even2-panty function, or some other Boolean function. We did not specify that the
oDD- 2 - PARTTY function would be defined in the first branch as opposed to
the second branch. We did not speci$/ that the second branch would be ignored by the result-producing branch.
The average strucfural complexity, Swith,of the 100%-correctprograms from
the 33 successful runs of the even-3-parity problem is 48.2 points with automatically defined functions (versus a value of S.i,tou, of 44.6 without automatically defined functions). That is, the even-3-parity problem is on the
non-beneficial side of the breakeven point for average structural complexity.
Figure 6.1"0 presents the performance curves based on the 33 runs for the
even-3-parify problem with automatically defined functions. The cumulative
probability of success, P(M,i) ,is94o/oby generation 2 and is 100% by generation 3. The two numbers in the oval indicate that if this problem is run through
to generation 3, processing a total of E*uo = 64,000 individuals (i.e., 16,000 x 4
generations x 1 run) is sufficient to yield a solution to this problem wlth99%
probability. The fact that P(M,i) is 39% for generation 0 is discussed in
chapter 26.
The 96,000 individuals that mustbe processed for the even-3-parity problem without automatically defined functions (as shown in figure 6.2) is L.5
times the 64,000 individuals needed with automatically defined functions.
That is, the even-3-parity problem is on the beneficial side of the breakeven
point for computational effort.
Chapter 6
o
a
(n
q)
I
fr A ,
E
q)
-
t-.
(h
-
t
-
.-
. T
-
Fl-
-
s
a
ct) ()
CJ
I
I
-
0
CH
*.
-
tr ||,
F
With Defined Functions
1,200,000
600,000
02550
Generation
Figure 6.10 Performance curves for the even-3-parity problem showing that E*rry = 64,000
with ADFs.
Thble 6.4 Comparison table for the even-3-parity problem.
Without ADFs WithADFs
50
Average structural
complexity S
Computational effort E
M.6
96,000
48.2
64,000
50
s
x
100,000
E
50,000
0
Without ADFs With ADFs
Figure 6.L1, Summary graphs for the even-3-parity problem.
Without ADFs WithADFs
177 Boolean Parity Functions
(ARGOARG1 ARG2) (ARGOARG1 ARG2)
Figure 6.12 l0O%-correct best-of-run program from generation 4of a run of the even-4-parity
problem withADFs.
Thble 6.4 cornpares the average strucfural complexitf, Swithot dfid Swtth,
and the computational eff.ort, Ewithout ar.d Ewith, for this problem with
automatically defined functions and without them.
Figure 6.11 summarizes the information in this comparison table and shows
a strucfural complexity ratio, Rs, of 0.92 and an efficiency ratio, Ru, of 1.50.
6.T7 EVEN-4-PARITY WITH ADFs
Since the even-4-parity function takes four arguments, our formula for making the architectural choices specifies that both automatically defined functions take three dummy variables. Note that the mere availability of a dummy
variable to an automatically defined function does not create any requirement that the function either refer to that argument or use it in any meaningful way.
In one run (out of 18 runs) of the even-4-parity problem with automatically
defined functions, genetic progamming discovered the following program
containing 28 points with a perfect value of raw fitness of 16 in generation 4:
(progn (defun ADFO (ARGO ARG1 ARG2)
(values (OR (NOR (OR ARGO ARGO) (ewo ARG2 ARG2)) (AND
ARG2ARGO))))
(defun ADF1 (ARGO ARG1 ARG2)
(values (ADFO ARG1 ARGO ARGO ) ) )
(values (ADF1 (ADF1 D2 D0 D3) (ADFO Dl_ D2 D3) (ADF0 DL D2
D1) ))) .
Figure 6.12 depicts this 100%-correctbest-of-run program as a rooted, pointlabeled tree with ordered branches.
The function definition for ADFO is equivalent to the thlee-argument Boolean
rule 165 whidL when simplified, is (EVEN-2 -pARITy ARGO ARG2 ) . That is,
ADFO ignores its second dummy variable (ancr) and then performs the even-2-
parrty function on ib first and third dummy variables, ARG0 and encz.
t78 Chapter 6
Figure 6.13 Simplified form of the result-producing branch of one solution to the even-4-
panty problem with ADFs.
A function is considered to be a parity rule if its overall behavior exactly
matches that of the even or odd panty function on any subset of two or more
of its arguments (or their negations). For example, ADFO above is a parity rule
as would be a four-argument automatically defined function that has the behavior of (oDD-2 -pARrry (Nor ARG2 ) ARG3 ) .
The function definition for ADF1 ignores its third dummy variable (ancZ)
and is equivalento (EVEN-2-PARITY ARG1 ARG0 ) .
Thus, the result-producing branch can be simplified to
(EVEN_2-PARITY (EVEN_2-PARITY D2 DO) (EVEN_2_PARITY D1 D3) ),
which is a solution to the even-4-pafity problem.
Figure 6.13 depicts this simplified form of the result-producing branch of
the above best-of-run program from generation 4 as a rooted, point-labeled
tree with ordered branches. EQV denotes the even-2-parity (equivalence)
function.
The solutions from two other runs are noteworthy.
First, the following best-of-run individual from generation 5 containing
53 points is interesting in that ADF1 merely permuted the order of the three
arguments of enrO:
(progn (defun ADF0 (ARGO ARGI- ARG2)
(values (OR (OR (NOR ARG1 ARG0) (AND ARG0 ARGI)) (NOR
(NAND ARG0 ARG2) (NaXn (NAND (OR ARGO ARG0) (On ARG1
ARGO)) (NAND (AND ARGO ARG2) (NAND ARGO ARGI)))))))
(defun ADF1 (ARG0 ARG1 ARG2 )
(values (ADF0 ARG1 ARG2 ARGO)))
(values (OR (ADFO (ADF1 D0 D2 D0) (ADF1 D2 D3 D1) (ADF1 D0
D3 D0)) (NOR (OR D2 D0) (ADF1 D3 D1 Dl))))).
Although all three of anr'0's dummy variables appear in the body ADFO,
only two of them (anco and ancl) actually affect the value returned by
ADFO. ADFO is equivalent to three-argument Boolean rule 153, which is
( EVEN-2 -PARITY ARGO ARG1 ) . ADFl is, in turrr, equivalentto (EVEN-2 -
PARTTY ARG1 ARG2 ).
In a different run, ADF1 in the following best-of-run 123-point progr.rm
from generation 9 always retums the constant NIL:
179 Boolean Parity Functions
(progrn (defun ADFO {ARGO ARG1 ARG2)
(values (OR (NAND (OR (OR ARGI, ARGI) (On ARGO ARGI))
(NANN (AND ARGO ARGI) (ON (AND (NAND ARGO ARGI) (NOR
ARGO ARGI)) ARGO))) (NOR (NAND (OR ARGI- ARG2) (NOR
ARG2 ARGO)) (NOR (AND ARG2 ARGO) (ANN ARGO ARGO))))))
(defun ADF1 (ARG0 ARG1 ARG2)
(values (NOR (OR (NAND (OR ARGO ARG2) (aNo ARG2 ARG0))
(NAND (NOn 1ry4l\p ARG1 ARGO) ARGO) (ADFO (AND (NOR
ARGO ARG2 ) ARCZ ) ARGO (NAND ARG2 ARGO ) ) ) ) (OR (NOR
ARG2 ARGI) (NANO (NOR ARGO ARGI) (NAND ARGO
ARG2) ) ) ) ) )
(values (NOR (AND (ADFI (ADFO D3 D]- D0) (oR D1 D3) (ADFI
D1 D2 D0)) (OR (OR DI D2) (ADFO DL D2 D3))) (ADF0 (ADFO
(NAND D2 D2) (NOn D0 D0) (aNn D0 D0)) (NAND (ADFO D1 D3
D0 ) (ADF0 D3 Dl D2 ) ) (NOR (AND D2 DL ) (ADFO D2 D2
D2)))))) .
The average strucfural complexity, Swith ,of the 10O%-correct programs from
the 18 successful runs (out of 18 runs) of the even- -pafity problem is 60.1
points with automatically defined functions (versus a value of S.i,nou, of 112.6
without automatically defined functions).
Recall that the average structural complex iry, S,ith,of solutions to the even3-parity problem was 48.Zpoints with automatically defined functions (versus 44.6 without them). Automatically defined functions are not beneficial as
to average structural complexity for the even-parrty problem of order 3, but
they are beneficial for the even-parity problem of order 4. Thus, for the evenparity problem, the breakeven point for average structural complexity appears
to be between three and four.
Figure 6.14 presents the performance curves based on these 18 runs for the
even-4-parity problem with automatically defined functions. The cumulative
probability of success, P(M,i) ,is56%by generation 5 and is L00% by generation 10. The two numbers in the oval indicate that if this problem is run through
to generation 10, processing a total of E,r, = !76,000 individuals (i.e., 16,000
x 11 generations x 1 run) is sufficient to yield a solution to this problem with
99% probability.
The 384,000 individuals that must be processed for the even-4-parity problem without automatically defined functions (as shornm in figure 6.3) is 2.18
times the!76,000 individuals needed with automatically defined functions.
Table 6.5 compares the average structural complexlt!, Sritrtolll drrd S*rth,
and the computational effort, Ewithout and Er,,r, for the even-4-parity probIem with automatically defined functions and without them.
Figure 6.15 summarizes the informationinthis comparisontable and shows
a strucfural complexity ratio, Rs,of 1..87 and an efficiency ratio, Ru, of 2.I8.
6.12 EVEN-S-PARITY WITH ADFs
Four dummy variables are available to both ADFO and anpl for the evenS-parify problem.
Chapter 6
^ 100
\\!
a(n
O
I
I
t
-
a
tlo)u
h
+)
.Fl
-
L
F|,
-
With Defined Functions
(10,l00Vo)
(1,lr%;o) 25
Generation
Figure 6.14 Performance curves for the even-4-parity problem showing fhat Er;r1 = 176,000
withADFs.
Thble 6.5 Comparison table for the even-4-parity problem.
Without ADFs WithADFs
'1.500.000
\
(50,IAUVo)
FI
q) (n(n
0)
I
l.r
A ,
-
q)
+a
a
d
t
.-
-
ta
FI
750,000
50
Average structural
complexity S
Computational effort E
112.6
384,000
60.1
176,000
120
s
ffi
400,000
E
200,000
0
WithoutADFs WithADFs
Figure 6.15 Summary graphs for the even-4-parity problem.
Without ADFs With ADFs
\= 2.18
181 Boolean Parity Functions
Atl 19 runs that we made of the even-S-parity problem using automatically
defined functions produced lO0%-correct solutions.
hr one run, genetic programming discovered the following 5L-point program with a perfect value of raw fibress of 32 in generation 9:
t^-^^^ /:^€,,- \},!\Jvrr \uE!urr ADFO (ARGO ARG1 ARG2 ARG3)
(values (AND (NAND ARG2 ARGO)
(AND (NAND ARG2 ARGO )
(oRARG2ARGO)))))
(defun ADF1 (ARG0 ARG1 ARG2 ARG3 )
(values (ADF0 (NOR ARG2 ARG2 )
(NAND (OR ARG2 ARGO) (NON ARG1 ARGO))
(ADFO ARGI- ARG3 ARG3 ARG3 )
(ADFO ARG1 ARGI- ARG3 ARG3 ) ) ) )
(values (ADFI (ADF0 D]- D1 D0 D4) (ADFO D3 D4 D2 D2)
(ADFO D0 D2 D4 D1) (On Dl D1)))).
The result-producing branch of this program calls on ADFO and anr'l- to
produce the even-S-parity function.
ADF 0 is equivalent to the four-argument Boolean ru1e23,130 which, in tum,
is equivalent to ( ODD- 2 - PARITY ARG0 ARG2 ) .
ADF1 hierarchically invokes ADFO and is equivalent to the four-argument Boolean rule L5,555. When simplified, AnFl is equivalent to the following three-argument combination of the even-Z-parity function and the
o dd-Z-p arity function:
(EVEN-2-PARITY ARG1 (ODD-2_PARITY ARG2 ARG3) ).
hr other words, both of the function-defining branches in this particular
rc}% correct solution define two lower-order parlty functions, one with two
arguments and one with three.
If we substitute these definitions into the result-producingbranch, we find
that it simplifies to:
(EVEN-3-PARrTy (ODD-2-pARrTy D3 D2) (ODD-2-pARrry D0 D4) D1),
which mimics the target even-S-parity function.
Whenever automatically defined functions are used to solve aproblem, we
can view the problem as having been decomposed into subproblems. In this
decompositiory the result-producing branch solves the overall problem and
the function-defining branches solve the subproblems. The subproblems in
this particular run involve Boolean rules 23,130 and 15,555. Genetic programming creates one computer program for rule 23,130 and a second program
for rule L5,555 in the bodies of the two function-defining branches. Genetic
programming also creates a solution to the overall even-S-parity problem in
the body of the result-producing branch. The solution to the second
subproblem (rule 15,555) hierarchically invokes the solution to the alreadysolved first subproblem (rule 23,730) on three occasions.
Figure 6.16 shows the directed acyclic graph of references in which ADFO is
used to define ADF 1 and in which anp 0 and anrl are used together to define
the even-S-parity function.
1.82 Chapter 6
Figure 6.15 Hierarchical arrangement of function definitions for aDFO and aopl employed
by the best-of-run program from generation 9 for the even-S-parity problem with ADFs.
A human programmer would probably decomPose a high-order parity
problem into lower-order parity problems. However, we were surprised
to find that genetic programming does not usually solve parity problems
by means of lower-order parity functions. The single l0O%-correct solution above (and the solutions to the even-3-parity and even-4-parity problems cited in the previous sections) should not lead the reader into thinking
that genetic programming with automatically defined functions mimics
the style of human programmers. In fact, the run described above is not
typical at all; it is the only run of 19 runs in which genetic Programming
solved the even-S-parity problem using a hierarchical composition of two
lower-order parity functions. In seven of the other runs, one of the automatically defined functions is a lower-order parity function, but the other
is not. In a majority of the L9 runs, neither of the automatically defined
functions is a lower-order parity function.
The following 233-point program is an example of one of the 11 solutions
from the L9 runs that does not use any lower-order partty functions:
(progn (defun ADFO (ARGO ARG1 ARG2 ARG3)
(values (AND (OR (NOR (NAND ARGI ARG3 ) (aun (OR (AND
ARG3 ARG3) (aNn (OR (AND ARG3 ARG2) (NAND ARG2 ARG3))
ARG2 ) ) (OR (NOR ARG3 ARGI) (XON ARGO ARG3 ) ) ) ) (NAND
(A}JD ARGO ARG3 ) (ON ARG2 ARG1 ) ) ) (AND (NAND (NOR ARG1
ARG2) (NAND ARG1 ARG2)) (OR (NAND ARG2 ARGI) (NAIIO
ARG]. ARG3))))))
(defun ADF1 (ARGO ARG1 ARG2 ARG3)
(values (ADF0 (OR (NOR ARG1 ARGI-) (ADFO (NOR ARG2 ARG2)
(AND ARG2 ARGO) (nOn ARG2 ARG2) (mNn ARG3 ARGI)))
(AND (NAND (AND ARGO ARGI) (ON ARG1 ARGI.)) (ADFO (ADFO
ARGI- ARGO ARGO ARG1 ) (atrO ARGI- ARG1 ARGO ARG3 )
(NOR ARG3 ARGO) (NAUN ARG1 ARGO))) (AND (ADFO (NOR
ARG2 ARGI) (NOR ARG2 ARG2) (NANN ARGO ARGO) (ADFO
ARG3 ARGO ARGO ARG2) ) (AND (NOR ARGO ARGO) (NOR ARG3
ARGO)) ) (ADFO (ADFO (NOR ARG1 ARGI) (OR ARG1 ARGI)
Boolean Parity Functions
(AND ARGO ARGI) (NAND ARG3 ARGI)) (NOR (ADFO ARG1
ARG2 ARG1 ARGO) (ANO ARGO ARG3) ) (OR (NAND ARG3 ARG2)
(AND ARG1 ARGI)) (NOR (NAND ARG3 ARGO) (OR ARGO
ARGO))))))
(values (NAND (ADF1 (ADFO (NOR D1 D3) (OR D0 D0) (ADFI D1
D3 D3 D2) (OR (AND (AND D0 D0) (NAND D1 D4)) (NAND
(ADFO D3 D0 D4 D0 ) (UOn (ADFO D3 D2 D2 D3 ) (NOR Dl_
D0))))) (OR (ADF1 D4 D2 D4 D3) (ADF1 D0 D0 Dl D3)) (AND
(ADF1 Dl D3 D3 D2) (On D4 D1)) (NOR (NOR D3 D4) (OR D0
D4))) (OR (NAND D3 D2) (NAND (ADFO D3 D0 D4 D0) (NOR
(AND D0 D0) (NOR Dl D0))))))).
In this program ADFO is the four-argument Boolean rule 7,420 and aopl- is
the rule 1.3,159. Neither areparity rules.
In another of the L9 runs, ADF O defines Boolean ruIeL3,260which is equivalent to (EVEN-2 -PARITY ARG1 ARG3 ) but ADF1 defines rule 65,535 (the
four-argument function returning the constant T). The latter is, of course, not
a parity rule.
(progn (defun ADF0 (ARGO ARG1 ARG2 ARG3)
(values (AND (AND (OR ARG1 ARG3) (ON (NAND ARG1 ARGI)
(NAND ARG3 ARG3 ) ) ) (OR (NAND ARG1 ARG]-) (NANN ARG3
ARG3)))))
(defun ADF1 (ARG0 ARG1 ARG2 ARG3)
(values (OR (OR (NAND ARGO ARGO) (ANO ARG2 ARGO)) (ADFO
(NAND ARG2 ARG3 ) (ADFO ARGO ARGO ARGO AF.G3 )(NOR ARG2
ARGO) (UeNn ARGO ARG2) ) ) ) )
(values (ADF0 (OR (ADFO D4 D0 D3 D1) (aNn D3 D2)) (ADFO
(oR D3 D2) (ADFO D0 D3 D3 D1) (ADFI_ DI D2 DL D2) (alpO
D2 D4 D0 D0) ) (ADF1 (NOR D1 D0) (NAND D0 D0) (AND D4
n1) (oR D2 D4) ) (NOR D2 D2) ) ) ) .
Lr another of the 19 runs, ADFO defines rule 61,,455 which is equivalent to
(EVEN-2-PARITY ARG2 ARG3 ). ADF1 defines rule2l,845 that is equivalent to (NOT ARGO ) and is not, of course, a parity rule.
Lr another run, the solution shown below emerged on generation 16:
(progn (defun ADF0 (ARGO ARG1 ARG2 ARG3)
(values (NAND (oR (OR (NOR ARG3 ARGO) (On ARG1 ARG3) )
(NAND (AND ARG2 ARGI) (NON ARG3 ARG2))) (NOR (AND (OR
ARG3 ARGO) (ANI ARG2 ARGO)) (AND (NAND (AND ARG2
ARGI) (lrOn ARG3 ARG2) ) (oR ARG2 ARG2) ) ) ) ) )
(defun adfl (ARGO ARG1 ARG2 ARG3)
(values (NOR (AND (ADFO (ADFO ARGO ARG2 ARG2 ARG0) (On
(NOR ARG1 (NAND ARG3 ARGI)) ARG2) (ADFO ARGO ARGO
ARGO ARG2) (anFO ARG2 ARG3 ARG1 ARGI) ) (OR (NOR ARG1
(NOR ARG1 (NAND ARGO ARGO ) ) ) ARG2 ) ) (AND (NAND ARGO
ARGO) (NAND ARG2 ARG2)))))
(values (ADFI (NAND (ADFO (NAND D1 D1-) (on D2 D2) (ADF0
D4 D1 Dl D1) (ADFI D3 D2 D4D2)) (NOR (NOR (OR D1 DO)
(NAND D1 D1)) (NOR Dl D1))) (ADF1 (NAND (NOR (NOR (OR
Chapter 6
D1 D0) (NAND D1 D1-)) (NOR Dr D1)) (NAND D1 D4)) (NOR D0
D4) (NAND (OR Dl D4) (NOR D]- D3)) (AND (AND D2 Dl-) (On
D4 D0))) (ADFI- (NOR (ADFI D3 D0 D3 D3) (ADF1 D0 D2 D3
D4)) (AND (NAND D4 D2) (On D2 D3)) (NOR (ADF1 D3 D0 D3
n3) (ADFI- D2 D0 D4D4)) (OR (AND D0 D0) (ADFI- D3 D2 D4
D4))) (ADF1 (NAND (ADF1 D0 D2 DL D2) (aNn D4 Dl)) (NAND
(oR D3 D0) (ADF0 D4 D2 D0 D2)) (NOR (ADFI D0 D0 D0 D3)
(ADFO D2 D2 D4 D0)) (AND Dl D4))))) .
In this program, the four-argument ADFO is merely a projection that returns
the value of dummy variable ARG2.
In the solution shown below from yet another of the L9 runs, ADFI- defines
Boolean ruLe15,420, which is equivalent to (ODD-2 -PARITY ARG1 ARG2 ),
but ADF0 is a function that merely recreates the NoR function (Boolean rule
1,,285) that is already available in the set of primitive functions. NoR is not, of
course, a parity rule.
(progn (defun ADFO (ARGO ARGI- ARG2 ARG3)
(values (NOR ARG2 ARG0 ) ) )
(defun ADF1 (ARG0 ARG1 ARG2 ARG3)
(values (AND (OR (NAND ARG2 ARG2) (NAND ARG2 ARGI)) (OR
ARG2ARGI))))
(values (ADF], (NoR (NOR Dl D4) (NAND D1 D3)) (ADFI (ADFID2 D3 D0 D3 ) (ADFO D3 D2 D3 D4 ) (ADFI D4 D2 D1 D1) (NOR
D4 D0)) (ADF1 (ADFI D3 D3 D3 D3) D4 D0 D2) (AND (ADFO
D0 D3 D3 D2) (uOn Dl D0))))).
hr summary, this sampling from the 19 runs illustrates how an automatically defined function may
. ignore some of its dummy variables,
. recreate a primitive function that is already in the function set of the
problem,
. be entirely ignored,
. define a constant value, or
. retum a value identical to one of the dummy variables.
Table 6.6 shows the characteristics of the L9 solutions to the even-S-parity
problem. Column 2 shows the rule number for the Boolean function defined
by ann O. Column 3 indicates whether ADF 0 is a parity rule. Column 4 shows
the rule number for the Boolean function defined by anrf . Column 5 indicates whether ADF1 is a parity rule. As can be seen, only one of the 19 runs
(5%) shoum in the table employs two lower-order parity functions. Seven of
the 19 runs (37%) employ a lower-order parity function in exactly one of the
two function-defining branches. The even-S-parity problem is solved in
11 (58%) of the 19 runs without using any lower-order parity function
whatsoever.
While it would be virtually inconceivable for a human programmer to
write subroutine implementing Boolean rules such as7,420 and 13,159 to
185 Boolean Parity Functions
Thble 6.6 Characteristics of 19 solutions to the even-S-parity problem.
solve the even-5-parity problem, this approach is typical of the majority
of runs of genetic programming with automatically defined functions.
From the point of view of the fitness measure that drives the evolutionary
process/ rules 7,420 and 13,159 arejust as good as the even-3-parity and
even-2-parity in solving the problem at hand.
Of course, genetic programming has no particular attachment to rules 7,420
and 13,159. A glance at table 6.6 indicates that 30 different rules appear in its
19 rows.
Table 6.7 summarizes the percentages of occurrence of the three different
pro$amming motifs that genetic programming evolved for the 19 runs shown
in table 6.6. As can be seen, in a majority of the runs, both of the functiondefiningbranches are not parity rules.
The following final point should notbe overlooked conceming the character of the solutions produced by these 19 runs: All 19 runs used their automatically defined functions. As already demonstrated, the even-S-parity
problem can be solved without automatically defined functions using the
four primitive functions and actual variables of the problem contained in the
result-producing brandr. Automatically defined functions are available, but
there is no requirement that genetic programming actually use them. For
simpler problems, the result-producing branch may solve the problem on an
occasional one or two runs out of a series of runs without acfually invoking
the available automatically defined functions (e.9., one of L9 solutions to the
four-sineproblemintable 5.12 didnotuse the availableautomaticallydefined
function). We rarely see this on more difficult problems. In other words, the
Chapter 6
Run ADFO IsADF aparityrule? ADFl Is ADF a parity mle?
I
.)
3
4
5
o
1
B
9
1 0
1-1
I2
1 3
I4
1 5
L 6
L 7
1-B
I 9
23130
OL2B5
03920
61_455
13260
04010
50115
07 420
42469
43 600
61680
25L98
29L99
L4I92
6420L
45061
40960
50115
0059 5
(ODD-2-PARITY ARGO ARG2 )
No
No
(EVH\_2-PARTTY ARG2 ARG3 )
(oDD-2-PARTTY ARG1 ARG3 )
No
(EVEN-2-PARITY ARG1 ARG2 )
No
No
No
No
No
No
No
No
No
No
(E\EN_2-PARITY ARG1 ARG2 )
No
15555
L5420
13260
2I845
6553 5
2r930
13226
13 159
19s58
52392
43690
59135
02i'7 6
5553s
5 8431-
63481
53232
L3226
27560
(EVEN-3-PARITY ARG1 ARG2 ARG3)
(ODD_2-PARITY ARG1 ARG2 )
(ODD_2-PARITY ARG1 ARG3 )
No
No
(ODD-2-PARITY
No
No
No
No
No
No
No
No
No
No
No
No
No
AP(:N APT:? \
.$!vv 1ulvJ /
186
Table 5.7 Motifs of the 19 solutions for the even-S-parity problem.
Motif Percentage of runs
Lower-order parity functions in both aoF 0 and ADF 1
A lower-order parity function in either
No lower-order parity function in either ADFO or ADFI5%
37%
58%
;l
q)
a
0
q)
I
Lr
A
-
a)
A
-
+a
v1
-
6B
rl
-
-
v
o -
.!|
EA
I
ts
CN
a
o
I
I
-
-
0
c+.(
I
. T
.Fl
-
L
A .
-
With Defined Functions
5,000,000
\
(50,lNVo)
2.500.000
(9,5Vo) 25
Generation
Figure 5.17 Performance curves for the even-S-parity problem showing that Erir1, = 464,000
withADFs.
individuals in the population that do not actively employ their available
automatically defined functions usually lose the race within the population
to those individuals that actually use their automatically defined functions.
The average structural compler,rty, S ritn, of the lOO%-correct programs from
the 19 successful runs (out of L9 runs) for the even-S-parity problem is 156.8
pointswithautomaticallydefinedfunctions (versusavalue of S*rtnout of2999).
There is a considerable reduction in the overall size of the programs that solve
this problem with automatically defined functions.
Figure 6.17 presents the performance curves based on the 19 runs for the
even-S-parity problem with automatically defined functions. The cumulative
probability of success, P(M,i) ,
is 63o/o by generation L5 and 100% by generation 28. The two numbers in the oval indicate that if this problem is run through
to generation 28, processing a total of E*u^ = 464,000 individuals (i.e., 16,000
x 29 generations x 1 run) is sufficient to yield a solution to this problem with
99% probability.
Thble 6.8 compares the average structural complexity, S*nnoa1 Errd Swith,
and the computational effort, Ewithout and E.ur, for the even-S-parity problem with automatically defined functions and without them. The 6,528,000
individuals that must be processed for the even-S-parity problem without
187 Boolean Parity Functions
Thble 6.8 Comparison table for the even-S-parity problem.
Without ADFs WithADFs
Average strucfural
complexity S
Computational effort E
299.9
6,528,000
156.8
464,000
\= l.gt 2N
s
100
8.000.000
Without ADFs With ADFs Without ADFs With ADFs
Figure 5.18 Summary graphs for the even-S-parity problem.
automatically defined functions (as shown in figure 6.4) is 14.07 times the
4&,000 individuals needed with automatically defined functions.
Figure 6.18 summ arrzesthe information in this comparison table and shows
a strucfural complexity ratio , Rs, af I.9land an efficiency ratio, Ru, of 1,4.07
for the even-S-parity problem.
6.1.3 EVEN-6-PARITY PROBLEM WITH ADFs
When automatically defined functions are being used, programs for the even6-panty problem contain two five-argument functions.
The reader will recall that we were unable to solve the even-6-parity problem with a population size of 16,000 without automatically defined functions
(section 6.6). However, 90% ofthe runs with automatically defined functions
solve the problem by generation 50.
The average structural complexity, 3 with ,of the 1O0%-correct programs from
the 19 successful runs (out of 2l runs) of the even-6-parity problem is 184.8
points with automatically defined functions.
Figure 6.L9 presents the performance curves based on 21 runs with automatically defined functions for the even-6-parity problem with automatically
defined functions with a population size of 16,000. The cumulative probability of success/ P(M,i),is90'/"by generation 41" and is still g0% by generation
50. The two numbers in the oval indicate that if this problem is run through to
generation 4L, processing a total of E.u, = I,3M,000 (i.e., 16,000 x 42 generations x 2 runs) individuals is sufficient to yield a solution to this problem with
99% probability.
Thble 6.9 compares the average strucfural complexlty, Sri,nour dfld S*rth,
and the computational effort, Erithout arrd Erro, for the even-6-parity problem with automaticallv
J defined functions and without them. This table
188 Chapter 6
^ l
L\-
a
a
e
I
I
-a
-
a
eH
h
+a .-
-
/-
ti
A
-
With Defined Functions
(8,ljVo)
25
Generation
Figure 5.19 Performance curves for the even-6-parity problem showing that Errry = L,344,000
with ADFs.
Thble 5.9 Comparison table for the even-6-parity problem.
Without ADFs WithADFs
'10,000,0006
O
- a\ a
t\ q)
\ 9
rsd. qoz,l
' I
A .
-
c)
A
F
'5.000.000 oI
a
-
A
bY
-
-
-' U'-
.-
rt
_,v onOl tr ""-,t fl
(41,90Vo\ +
P(M,i)
a- I(M, i, z)
4I E = 1,344,000
Average structural
complexity S
Computational effort E
328.0
70,r76,000
184.8
1.,34,000
includes the rough estimates (section 6.6) of 70,176,000 for Ewithout and 328.0
for S*r,nout for the even-6-parrty problem without automatically defined
functions.
Figure 6.20 summ arlzes the information in this comparison table and shows
a structural complexity ratio, Rs, of 1.77 and an efficiency ratio, Rr, of 52.2
for the even-6-parity problem.
6.14 SUMMARY FOR THE EVEN.3., 4.,5., AND 5-PARITY PROBLEMS
Table 6.L0 compiles the observations from the above runs of the even 3-,4-,
5-, and 6-parity problems into a single table. As can be seen, for the even 3-,
4-,5-, and 6-parity problems, the efficiency ratio, Ra, is greater than 1 (indicating that fewer fitness evaluations are required to yield a solution to the
problem with99% probability with automatically defined functions than without them). The even 3-, 4-,5-, and 6-parity problems are all beyond the
breakeven point for computational effort.
The structural complexity ratio, R5, is less than 1 for the even-3-parity problem, but greater than L for the even- -, 5-, and 6-parity problems. The
189 Boolean Parity Functions
s
2N
80.000.000
Without ADFs With ADFs Without ADFs With ADFs
Figure 6.20 Summary graphs for the even-6-parlty problem.
Thble 5.10 Summary table of the structural complexity ratio, R5, and the efficiency
ratio, Rs ,
for the even-3 -
, 4- , 5- , and 6-parity problems.
Problem Structural complexity ratio rR5 Efficiency ratio R"
Even-3-parity
Even4-parity
Even-S-parity
Even-6-parity
0.92
1,.87
1,.91,
t.77
1.50
2.18
1,4.07
52.2
breakeven point for average structural complexity for the even-parity problem appears to be between three and four.
6.1.5 SCALING FOR THE EVEN-3-, 4.,5., AND 6.PARITY PROBLEMS
This section considers the question of how the average structural complexity
and the computational effort change as a function of problem size for the
even-parity problem.
We first consider the average stmctural complexity, S,ithout arrd S*itrr, of
the genetically evolved solutions to the even-parity problem, with and without automatically defined functions.
Table 6.11 consolidates the values of the average structural complexity,
Swithout artd Sri,n,of 100% correct solutions of the even-3-, 4-,S-,and 6-parity
problems, with and without automatically defined functions ,for apopulation
size of 16,000. The value of 328.0 for S*r,no,r shown in this table for the even6-parrty problem without automatically defined functions is the rough estimate from section 6.6.
Figure 6.2L shows the average structural complexlty, S*itnout drrd S*ith,
for the solutions produced on the successful runs of the even-3-, 4-,5-, and
6-parity problems. The horizontal axis reflects the arity k (the order, number of arguments, number of input bits) of the problem.
When we perforrn a linear least-squares regression on the four points for
the runs without automatically defined functions, we find that the structural
complexitf, Swithout, cdrrbe expressed in terms of the number of arguments,
A, as
t90 Chapter 6
Thble 6.LL Comparison of the average structural complexity of solutions to the even3-, 4-, 5-, and 5-parity problems, with and without ADFs
Swithout
;
Jwith
M.6
48.2
TLz,6
60.1
299.9 328.0
156.8 184.8
+ WithoutDefinedFunctions
With Defined Functions
Arity
Figure 5.2L Comparison of average structural complexity of solutions to the even-3-, 4-,5-,
and 6-parity problems, with and without ADFs.
3 without = *270.6+ 103.84,
with a correlation of 0.96. The slope indicates that it takes about a hundred
additional points in the program tree to handle each additional argument to
the parity function.
Lr contrast, when we perform the linear regression for the runs with automatically defined functions, we find that the structural complexiW, S*i,n, can
be expressed in terms of the number of arguments, A, as
Swith =-115.5 + 50.64,
with a correlation of 0.95. The slope indicates that it takes only about an additional fifty points in the program tree to handle each additional argument to
the parity function. The slope with automatically defined functions is only
about a half of the slope without them. Thus, as the size of the problem is
scaled up, the average size of the solutions with automatically defined functions grows at less than half the rate than without them.
This conclusion stands even though the value of 328.0 for S.i,r,ory for the
even-6-parity problem is only a rough estimate and not actual data. Figure 6.6 indicates that the general direction Ltt S.i,1rou, averaged over the ]9
runs is toward the mid 300s thereby making the value of S*r,nour for the
even-5-parity problem greater than the actual values of 44.6, 112.6, and,
299.9 for the even-3-, 4-, and S-parity problems, respectively. Ary actual
value of S*r,no,1 for the even-6-p afity problem in the mid 300s oi high",
191, Boolean Parity Functions
Thble 6.12 Comparison of the computational effort for the even-3-, 4-,5-, and 6-
parity problems, with and withoutADFs.
70,r76,000
I,3M,000
will support the conclusion that the average size of the solutions produced
by genetic Programming increases as a function of problem size at a lower
rate with automatically defined functions than without them.
We now tum our attention to the computational effort required for the evenparlty problem, with and without automatically defined functions.
Table 6.12 consolidates the values of the computational effort with and
without automatically defined functions, for the even-3-, &,5-, and 6-parity
problems with a population size of 16,000. Since we were unable to solve the
even-6-parityproblem without automatically defined functions after L9 runs,
the value of 70,176,000 for Ewithout shown in this table for the even-6-parity
problem without automatically defined functions is the rough estimate computed in section 6.6.
Figure 6.22 shows the computational effort for the even-3-, 4-,5-, and
6-pafity problems with and without automatically defined functions. \A/hen
automatically defined functions are not used, there is an explosive growth in
Ewithout (spanning about three orders of magnitude) as a function of problem
size. The curve applicable to automatically defined functions closelyhugs the
horizontal axis and is barely visible on this figure.
Figure 6.23 shows the same data as figure 6.22using a logarithmic scale
on the vertical axis, thereby making the graph applicable to automatically
defined functions visible. The computational effort is dramatically less
with automatically defined functions than without them.
When we perform a linear regression on the progression of values of computational effor! Ewithout (96,000,384,000,6,528,000, and 70,L76,000), we find
that E.,,rout cffibe expressed in terms of the number of arguments, A, as
E*ithout - -'7 8,100, 000 + 2I,640,0004,
with a correlation of 0.82.
Aglance at figure 6.23 suggests tryingarlexponentinlregression (i.e., a linear
regression on the logarithm of the dependent variable). \zVhen we perform an
exponential regression on the four-point curve without automatically defined
functions, we find that the computational effort, Ewithout ,canbe stated in terms
of the number of arguments, A, as
E,ithout = l7 'la 1go'982a '
with a correlation of 0.99. The exponential regression produces a better fit to
this data than the linear regression; howeveq, all conclusions must be tempered by the considerable uncertainty introduced by the rough estimate we
used for the value of E*uoor, for the even-6-parity problem and by the small
Chapter 6
E*ithout
Ewith
95,000 394,000 6,529,000
64000 \76,000 464,000
192
80.000.000
+ Without Defined Functions
- With Defined Functions
E
40,000,000
Arity
Eigwe 5.22 Comparison of computational effort for the even-3-, 4- , 5-, and 6-parity problems,
with and without ADFs.
100,000,000
10,000,000
E
1,000,000
100,000
10,000
+ Without Defined Functions
With Defined Functions
Arity
Figure 6.23 Comparison of computational effort for the even-3-,4-,5-, and 6-parity problems,
with and without ADFs, with logarithmic scale.
number of data points involved. Lr addition, the fact that a curve can be fit to
empirical data does not, of course, establish the existence of any causal relationship between the variables involved.
When we perform a linear regression on the progression of values
of E with (&,000 , 17 6 ,000 , 4U,000 , and 1, ,W,000), we find that the computational
effort, Ewith,can be stated in terms of the number of arguments, A, as
E*ith = -1, 350, 000 + 413, 0004,
with a correlation of 0.92.Ittakes about 22 million additional fitness evaluations to handle each additional argument to the parity functionwithout automatically defined functions as compared to about four hundred thousand
with them. This is a ratio of about 52:1.
\zVhen we perform an exponential regression on the four-point curve with
automatically defined functions, we find that the computational effort, E*i1p,
can be stated in terms of the number of arguments, A, as
E*rth =3070 x 100'43eA'
I93 Boolean Paritv Functions
with a correlation of 0.99. This exponential function is a better fit to this data
than the straight line. The exponent is only 0.439 with automatically defined
functions as compared to 0.982without them.
The conclusion (main point 6) that will prove to be common to the three
problems in this book for which a progression of scaled-up versions is
studied (the parity problem, the lawnmower problem of chapter 8, and
the bumblebee problem of chapter 9) is that the computational effort
increases as a function of problem size at a lower rate with automatically
defined functions than without them. Both the linear regression and the
exponential regression support this conclusion for this problem. Note that
the lower rate (not the functional form) is the conclusion that will prove to be
consistent with the data from all three problems. This discussion of scaling
will continue in sections 8.15 and 9.13 and in chapter L0.
6.L6 HIGHER-ORDER EVEN-PARITY PROBTEMS
As previously mentioned, after L9 runs with a population size of L6,000,
we were unable to evolve any solutions to the even-6-parity problem without automatically defined functions. Nonetheless, we can solve the evenparity problem for orders7,8,9,10, and LL with hierarchical automatically
defined functions. In fact, we can do so with a population size of only
4,000 because we will not be making any runs without automatically
defined functions in this section. We stopped this demonstration at the
even-LL-parity because the even- Il.-parity problem.is extremely time-consuming (given our practice of using 100% of the 2^ possible fitness cases
in our runs of Boolean problems).
We decided to use two four-argument automatically defined functions
throughout this section. This choice was not made on the basis of any analysis of the nature of the Boolean parity problem, but instead was made on the
basis of available computer time. For historical reasons, the runs in this section involving a population of size 4,000 are the only runs in this book
employing fitness proportionate selection and greedy over-selection (rather
than toumament selection).
5.16.1 Even-7-Parity Problem
In one run of the even-7-panty problem, the following1,}2-pont best-of-run
program from generation L0 achieved a perfect value of raw fitness of 128
(out of 128):
(progrn (defun ADFO (ARG0 ARG1 ARG2 ARG3)
(values (AND (oR (OR ARG0 ARG3) (uon ARG3 ARG3)) (NOR
(NOR ARGO ARG].) (AND ARG1 ARGO) ) ) ) )
(defun ADF1 (ARGO ARG1 ARG2 ARG3)
(VAluCS (NOR (ADFO (ADFO ARG3 ARGO ARGO ARG2) (ON ARG1
ARGI) (OR ARG1 ARGI-) (OR ARGO ARGI)) (AND (NOR ARGIARG3) (ADFO ARG3 ARGO ARGO ARG2) ) ) )
Chapter 6
(values (ADFI- (OR (ADF0 D4 D5 D4 D5) (enn D] D2)) (ADFI
(ADFO (AND D2 DB) (ADF1 D0 D2 D9 D3) (ADFO D4 D3 D4 D0)
(NAND D0 D3)) (NAND D0 D8) (AND D7 D6) (OR D7 D1))
(ADF1 (OR D7 DB) (ADF0 D9 D2 D9 D9) (aNO D7 D3) (NOR D4
D7)) (OR (NOR D6 D7) (ADFO D2 D2 D3 D4))))).
The first branch of this program defines a four-arsument ADFO (rule
26,214) which ignores two of its four arguments and is equivalent to
(ODD-2-PARITY ARGO ARG1) .
The second branch defines a four-argument ADF1 (rule 26,265) in terms of
ADF0 and is equivalent to
(EVEN_3-PARITY ARGO ARG1 ARG3).
Substituting the definitions of aorO and anr'1, the result-producingbranch
becomes
(EVEN-2-PARTTY (EVEN-2-PARITY (NAND DO D2) (ODD_2_PARITY D3 D1) )
,
(OND-2_PAR]TY (EVEN_2-PARITY (NOT D5) D4)
(oR D2 D0))),
which is equivalent to the target even-7-parity function. Thus, in this particular run, two lower-order parity functions were used as the basis for solving
the problem.
In another run of the even-7-parity problem, a 100%-correctg2-point bestof-run program appears in generation 14. The first branch of this program
defines a four-argument ADFO (rule 42,245). The second branch of this program detines a four-argument ADF1 (rule 49,980) which ignores one of its
four arguments. ADF1 is equivalent to
(oDD-3-PARTTY D3 D2 D1).
Figure 6.24 presents the performance curves based on 29 runs for the
even-7-parity problem with automatically defined functions. The cumulative probability of success, P(M,i) ,is20.7%by generationl7, and34.5'/"
by generation 50. The two numbers in the oval indicate that if this problem is run through to generationl7,processing a total of E,r, = 1,440,000
(i.e.,4,000 x 18 generations x 20 runs) individuals is sufficient to yield a
solution to this problem with 99'h probability.
The search space of 7-argsrtent Boolean functions returning one value is
of size 22' * 2128 = 1ff8.
6.16.2 Even-8-Parity Problem
The 8-, 9-,10-, and 11-parity problems can be similarly solved using hierarchical automatically defined functions. While runs of the even-7-p afity
function (which has 27 - 128 fitness cases) are time-consuming, they are
still sufficiently fast to enable us to accumulate enough successful runs
after expending a reasonable amount of computer time to allow us to make
195 Boolean Parity Functions
-
v
€)
a
a
o
I
L
A .
-
q)
I
.t)
-
-
-
-
-
U
.-
Fl
l4T
-
a
(t)
q)
I
I
-
-
0
tsso
h€
.-
-
-
t i
A .
-
With Defined Functions
8.000.000
4.000.000
S
(50,34.5Vo)
25
Generation
Figure 5.24 Performance curves for the even-7-parity problem showing that E*irp = 1,M0,000
withADFs.
a meaningful performance curve. This is not the case for runs of the even8-parity problem and higher-order parity problems. Consequently, for each
of the 8-,9-,10-, and 11-parity problems, we made one set of four simultaneous runs on our four-processor parallel LISP machine. The 8-, 9-,I0-,
and 1L-parity problems were each solved at least once within our first
(and only) set of four runs for these problems.
In one run of the even-8-parity problem, the best of generatton2f contains
186 points and attains a perfect value of raw fitness of 256. The first branch of
this program defines a four-argument ADFO (rule L0,280). The second branch
of this program defined a four-argument ADF 1 (rule 26,21.4). This branch then
ignores two of its four arguments and is equivalent to
(ODD_2_PARITY ARGO ARG1) .
ADF0 is not a parity rule.
6.16.3 Even-9-ParityProblem
The best of generation 40 of one run of the even- 9-parltyproblem evolved
a parrty function of order four as one of its automatically defined functions. This program contarnsZ24points and attains a perfect value of raw
fitness of 5L2. The first branch defines a four-argument ADFO (rule 1.,872).
The second branch of this program defines a four-argument ADF1 (rule
27,030) which is equivalent to the odd-4-parity function. Thus, the even9-pafity problem employs a parity function of order four as one of the
two available automatically defined functions. This solution to the
9-parrty problem is the first time we have seen the emergence of aA-parity
function.
IM=4oool \
I z=998o I
I R(z)=20 I
I N=29 |
l- p,M,il I
l+ I(M, i' z) I
17 E = 1,440,000
(12,3.5Vo)
Chapter 6
6.16.4 Even'L0-Paritv Problem
In a run of the even-10-p arlty problem, the best of generation 40 contains 200
points and attains a perfect value of raw fitness of T,024. The first branch of
this program defines a four-argument ADFO (rule 38,79L). The second branch
of this program defines a four-argument ADF1 (rule 23,205). This branch then
ignores one of its four arguments and is equivalent to
(EVEN-3-PARrrY D3 D2 D0).
Notice that 186, 224, and200 are the number of points in the above solutions with automatically defined functions for the even-8-, 9-, and 10-parity problems, respectively. In comparison, solutions to the far simpler
even-S-parity problem average 299 .9 points without automatically defined
functions.
6.16.5 Even-L1-Paritv Problem
Finally, in one run of the even-I1,-parlty problem, the best of generation 21
contains 220 points and attains a perfect value of raw fitness of 2,048. It is
shown below:
(progn (defun ADF0 (ARG0 ARG1 ARG2 ARG3)
(values (NAND (NOn 15a1qp (OR ARG2 ARGI) (NaNo ARG1
ARG2)) (NOR (OR ARG1 ARGO) (NA\TD ARG3 ARGI))) (NAND
(NAND (NAND (NAND ARG1 ARG2) ARGI) (OR ARG3 ARG2))
(NOR (NAND ARG2 ARG3 ) (On ARG1 ARG3 ) ) ) ) ) )
(defun ADF1 (ARG0 ARG1 ARG2 ARG3)
(values (ADFO (NAND (OR ARG3 (OR p,RGO ARGO)) (AND (NOR
ARG1 ARGI) (ADFO ARG1 ARG1 ARG3 ARG3))) (NAND (NAND
(ADFO ARG2.ARG1 ARGO ARG3) (ADFO ARG2 ARG3 ARG3
ARG2)) (ADFO (NAND ARG3 ARGO) (NOR ARGO ARGI) (ANO
ARG3 ARG3) (NAND ARG3 ARGO))) (ADFO (NAND (OR ARGO
ARGO ) (ADFO ARG3 ARG1 ARG2 ARGO ) ) (ADFO (NOR ARGO
ARGO) (NAND ARGO ARG3) (On ARG3 ARG2) (aOn'O ARG1 ARG3
ARGO ARGO)) (NOR (ADFO ARG2 ARGI- ARG2 ARGO) (NAND
ARG3 ARG3 ) ) (AND (AND ARG2 ARGI) (NOR ARG1 ARG2 ) ) )
(AND (NAND (ON ARG3 ARG2 ) (NAIII ARG3 ARG3 ) ) (OR (NAND
ARG3 ARG3) (aNo ARGO ARGO) ) ) ) ) )
(values (oR (ADF1 Dl D0 (ADF0 (ADFI (OR (NAND D1-D7 ) nf )
(ADF0 D1 D6 D2 D6) (anFr D6 D6 D4 D7 ) (NAND D6 D4)) (ADFI-
(ADFO D9 D3 D2 D6) (ON DlO Dl) (ADFI D3 D4 D6 D7 ) (ADFO
D1O D8 D9 D5)) (ADFO (NOR D5 D9) (NAND D1 DlO) (ADFO DlO
D5 D3 D5) (NOR DB D2)) (OR D6 (NOR D1 D5))) D1) (NOR
(NAND D]- D10) (ADFO (On (ADFO D6 D2 DB D4) (On D4 D7))
(NOR D10 D6) (NOR Dr D2) (ADF1 D3 D7 D7 D6) ))))).
The first branch of this program defines the four-argument ADFO (rule
50,1L5). This branch is equivalent to
(EVEN_2-PARITY ARG1 ARG2 ) .
197 Boolean Parity Functions
The second branch defines a four-argument ADFI- which is equivalent to
the even4-parity function.
Substituting the definitions of the defined functions ADFO and Aor1, the
result-producing branch simplifies to
(on (EVEN-4-PARrrY
D 1
D O
(EVEN_2 -PAR.ITY (EVEN-2 -PARITY
(NAND D1 D1O)
(EVEN-2-PARTTY D5 D3) ) )
( EVEN_4 _PARITY
(EVEN-2-PARITY D3 D2)
(oR D10 D1)
(EVEN_4_PARITY D3 D4 D6 D7)
(EVEN-2-PARrrY DB D9) )
D1)
(NOR (NAND D1 D1-0)
(EVEN-2-pARrry (NOR D10 D6) (NOR DL D2 ) ) ) ) .
The even-2-parity function (aorO) appears six times and the even-4-parity
function (ann1) appears three times in this simplified version of the '1.00o/"-corcect solution to the even-11,-pafity problem. In other words, genetic
programming solved the even-1L-parity problem by automatically
decomposing it into parity functions of lower orders.
The unsimplified version of the l0O%-correct solution to the even1t-parity problem employing automatically defined functions contains
only 220 points. This size is smaller than the solutions without automatically defined functions to the far simpler even-5-parity problem (which
average 299.9 points).
Figure 6.25 shows the simplified version of the result-producing branch of
this best-of-run individual from generation 2l for the even-LL-parity
problem.
The above solution to the even-L1-pafity problem emerged on generation
21 ofone of our four runs. Because the other three runs had each only reached
Figure 6.25 The result-producing branch of the best-of-run program from generation 21 for
the even-11-parlty problem is assembled from even-2-parity and even-4-parity functions.
Chapter 6
the neighborhood of 21 generations and because of the time-consuming nature
of this problem, the other three runs were abandoned.
The search space of 11-argument Boolean functions retuming one value is
of size 22" - 22,08 = T06't6.
A videotape visualization of the solution to the even-LL-parity problem
(and 21 problems from Genetic Programming) canbe found in Koza and Rice
1992a. See also Koza 1992b.
199 Boolean Parity Functions
Determining the Architecture of the Program
hr applying genetic progranuning with automatically defined functions to a
problem, it is first necessary to make a group of choices conceming the architecture of the yet-to-be-evolved overall programs in the population. We have
called this group of architectural choices the sixth major step in preparing to
use genetic programming.
The sixth major step involves determining
(a) the number of function-defining branches,
(b) the number of arguments possessed by each function-defining branch,
and
(c) if there is more than one function-defining branch, the nature of the
hierarchical references (if any) allowed between the function-defining
branches.
After this sixth major step has been performed, the first and second major
steps in preparing to use genetic programming must be performed for each
branch of the overall program. That is, it is necessary to specify the terminal
set and function set for the result-producingbranch as well as the terminal set
and function set for each function-defining branch in the overall program.
Once all of the preparatory steps have been performed, a run of genetic
programming may be made.
Sometimes these architectural choices flow so directly from the nature of
the problem that they are virtually mandated. However, in general, we have
no way of knowin g a priori the optimal number of automatically defined functions or the optimalnumber of arguments for each such defined function that
will be useful for a given problem.
How should these architectural choicesbe made? How important are these
choices in determining whether genetic progranuning can solve a problem?
How influential are these choices in determirirg the amount of computational effort required to solve a problem?
Five different methods for making these architectural choices are discussed
in this book:
' prospective analysis of the nature of the problem,
. seemingly sufficient capacity,
. affordable capacity,
. retrospective analysis of the results of acfualruns, and
. evolutionary selection of the architecture.
We start by reviewing the three of these five methods that we have used so
far in this book.
Then we will discuss the method of retrospective analysis. The retrospective
analysis for the even-S-parity problem indicate that it can be solved with any
of 15 different architectures that might reasonably have been chosen for it.
We defer discussion of the method of evolutionary determination of the
architecture until chapters 2L through 25 in which it will become clear that we
need not make any architectural choice at all.
7.']. METHOD OF PROSPECTIVE ANALYSIS
When we were preparing to solve the two-boxes problem (chapter 4), we
used the method of prospective analysis of the nature of the problem. We
chose three as the number of arguments for the automatically defined function because we knew that boxes have dimensionality 3 and could therefore
reasonably anticipate a useful decomposition involving a subproblem of
dimensionality 3. Also, because we knew the problem involved only boxes
(and not, say, amixture of circles and pyramids), we could reasonably anticipate that the result-producing branch could assemble a solution through
multiple uses of just one automatically defined function.
If we had not known that this problem involved boxes (i.e., if the problem
had been presented as an unidentified problem of symbolic regression over
six independent variables), we would have had no reason either to choose
three as the number of arguments for the automatically defined function or to
choose one as the number of automatically defined functions. Lr that event,
we probably would have chosen five (i.e., one less than the number of independent variables) or perhaps six as the number of arguments for the automatically defined functions. tr additiory we probably would have made more
than one automatically defined function available to each overall program
because we would not have known how many exploitable regularities might
be present in the problem environment.
Of course, if the problem had a natural decomposition involving a subproblem of dimensionality 4, our choice of five or six as the number of
arguments for the automatically defined function would almost certainly
not have precluded a solution. Indeed, we have repeatedly seen that genetic
programming often ignores available dummy variables in the body of an
automatically defined function. Similarly, a solution to the problem would
almost certainly not have been foreclosed if the number of-available automatically defined functions that we chose had been less than the number
of exploitable regularities of the problem environment. More likely, some
of the potential gain in performance from exploiting some of the regularities would have been lost.
202 Chapter 7
Similarly, in the Boolean 6-symmetry problem (subsection 5.2.1)' we used
our knowledge that the symmetry function involves a Process of pairwise
matching in choosing two as the number of arguments and one as the
number of automatically defined functions. If we had not known that
matching was involved in the symmetry problem, we probably would
have made the same choices we did for the Boolean 6-pattty problem
(chapter 6).
The amount of prospective analysis that is appropriate based on foreknowledge of thenature of theproblem depends ontheuser's goals.At one extreme,
if the user's goal is to solve a practical problem, then it is appropriate to use
all available analytic techniques, all available foreknowledge about the
underlying regularities, symmetries, and homogeneities of the problem, and
all available information about the problem environment in choosing the
architecture. At the other extreme, if the user is studying the nature of automated problem solving, the focus will be using the minimum amount of
human analysis and knowledge.
7.2 METHOD OF PROVIDING SEEMINGLY SUFFICIENT CAPACITY
For many problems, the architectural choices can be made on the basis of
tryrng to provide seemingly sufficient capacity.Oot approach to making the
architectural choices for the Boolean even-parity problems in chapter 6 illustrated this approach. We envisaged that solutions to the even-3-,4-,5-, and
6-panty problems would involve lower-order parity functions. We therefore
made a seemingly sufficientnumber of automatically defined functions (each
with a seemingly sufficient number of arguments) available to the yet-to-beevolved overall programs.
As previously mentioned, we were wrong in anticipating that genetic
programming Would usually decompose these parity problems into lowerorder parity functions (table 6.6). Nonetheless, we provided seemingly sufficient capacity to enable genetic pro#amming to make such a decomposition.
hr fact, genetic programming did make a decomposition into lower-order
functions; it just tumed out that the lower-order functions usually were not
parity rules.
The sextic-polynomial problem (subsection 5.1.1) also illustrates this
approach of providing seemingly sufficient capacity. We envisaged a
solution based on the repeated roots of the polynomial. In actual practice,
genetic programming never produced a solution based on the repeated
roots; instead, it frequently used the available automatically defined function as a squaring function and used the result-producing branch to identify the square root of the polynomial. Nonetheless, we attempted to
provide seemingly sufficient capacity and, in fact, provided sufficient
capacity to solve the problem.
The four-sines problem (subsection 5.3.1) also illustrates the method of
providing seemingly sufficient capacity.
203 Determining the Architecfure of the Program
7.3 METHOD OF USING AFFORDABLE CAPACITY
Considerable additional computer resources (time and virtual memory) are
required by each additional function-defining branch, especially if they are
permitted to call one another hierarchically. Additional computer resources
are also consurned by additional arguments for each automatically defined
function. Thus, in practice, the amount of computer resources that one can
afford to devote to a particular problem will strongly influence or dictate the
architectural choice. Although we would like to have had the luxury of anaWr gthe problems in this book using the first two methods described above,
computer time was, in fact, the controlli.g factor in most of the architectural
choices actually made.
We used a four-processor parallel Texas Instruments Explorer II+ computer (a LISP rnachine) of late 1980s vintage for all the runs of problems
reported in this book. Except for a few simple problems in the early chapters, a single run of most problems in this book consumed between about
a half day to several days of computer time on one processor. Single runs
of some problems required up to L0 days. Moreover, memory fragmentation due to garbage collection over the duration of a run and other behaviors peculiar to LISP machines are an additional practical factor limiting
the length of runs. As already discussed in section 6.16, our architectural
choices for the even-6-, 7-, 8-, 9-, L0-, and -1L-p arity problems were dictated almost entirely by considerations of available computer resources/
not by clever considerations of how to decompose Boolean problems. It
would have been interesting to see how genetic programming solved the
1,1-parity problem if the overall program consisted of a half dozen ll-av
gument automatically defined functions. Indeed, an accurate way of stating our actual methodology is as follows: In making these choices, we
hoped that the capacity that we could afford to devote to the problem
would prove to be sufficient to solve the problem.
7.4 METHOD OF RETROSPECTWE ANALYSIS
A retrospective analysis can be used to determine the optimal number of
automatically defined functions and number of arguments that they each
possess for a given problem. If one is dealing with a number of related
problems, a retrospective analysis of one problem may provide guidance
for making the required architectural choice for a similar problem.
A retrospective analysis may also indicate whether, and to what extent,
architectural choices matter in runs of genetic programming with automatically defined functions.
The idea is to make a number of runs of the problem with different combinations of the number of automatically defined functions and the number of
arguments that they each possess, to compute the computational effort
required to solve the problem with each such architecture, and to identify the
optimal architecture.
204 ChapterT
Boolean functions are often good candidates for conducting comparative
experiments. The Boolean even-S-parity problem is the smallest problem that
has interesting decompositions. Apopulation size of 4,000 is used in this section. Fifteen different architectures will be tested with the number of automatically defined functions ranglng between one and five and with their
number of arguments ranging between two and four. A single run of the
even-S-parity problem requires about 6 to L8 hours with this population size
depending on the architecture.
Except for the choice of 4,000 as the population size, our approach to the
problem is the same as described in tables 6.2 and6.3. \Mhen there are two or
more function-defining branches, each automatically defined function celn
referhierarchicallytoeveryotheralready-defined (lower-numbered)function.
7.4.1 Baseline for the Even-S-Parity Problem without ADFs
We first solve the even-S-parityproblem without automatically defined functions with a population size of 4,000.
The average strucfural complextty, S*i,nout, of the 10O%-correct Programs
from the LL successful runs (out of 25 runs) of the even-S-parity problem without automatically defined functions is 299.89 points.
Figure 7.1 presents the performance curves based on the 25 runs of the
even-S-parity problem without automatically defined functions. The curnulative probability of success, P(M,l), is 44%by generation 50. The two
numbers in the oval indicate that if this problem is run through to generation
50, processing a total of E*urout = I,632,000 individuals (i.e., 4,000 x 51 generations x 8 runs) is sufficient to yield a solution to this problem with
99"hprcbability.
With Defined Functions
- 20,000,000 E
i1)
-3:
( 50 E=1,632,000)E
U.-
r Q
it
t -
i€) , &
r 10,000,000 s
,At T =
| '- (50,44Vo)
'ol €
! >
l.-iero
=
rts
t-, LJ (34.4Vo\ JU
Generation
Figure 7.1 Performance curves for the even-S-parity problem showing tha t E without = 1 ,632,N0
withoutADFs.
Determining the Architecture of the Program
-. 1
}\V
V) (n
q)
CJ
I
t
-
a
+r
*a
.-
-
c!
A
-
L
A .
TI
205
50
\e
(n
a
0)
I
I
F.
-
(t)
*i
>-r
I
a -
-
.-
A
cl
L
A
tsl
With Defrned Functions
(50,90Vo)
3.000.000
(5,3vo) Generation
Figure7.2 Performance curves for the even-S-parity problem showing that E*ir; = 300,000
withADFshaving an argumentmap of {2}.
We now proceed to solve this problem with 15 different architectures using
automatically defined functions. Each group of runs is identifiedby the argument map associated with the set of their automatically defined functions.
7.4.2 One TWo-Argument ADF
Figure 7.2 presents the perforrnance curves based on 64 runs of the evenS-parity problem with one two-argument automatically defined function
showing that it is sufficient to process 300,000 individuals to yield a solution
with 99% probability.
7.4.3 One Three-Argument ADF
Figure 7.3 presents the performance curves based on 35 mns of the evenS-parity problem with one three-argument automatically defined function
showing that it is sufficient to process 384,000 individuals to yield a solution
with 99% probability.
7.4.4 One Four-Argument ADF
Figure 7.4 presents the performance curves based on 75 runs of the evenS-parity problem with one four-argument automatically defined function,
showing that it is sufficient to process 592,000 individuals to yield a solution
with 99% probability.
6,000,000-
q)
q2
v) q)
I
L
A .
-
q)
-
+J
a
-
-,
!a
-
FFI
FI I
f-l
l- p,MD 1
l+ I(M' i' z)l
I M=4poo I
I z=997o I
I R(z)=l I
I N=64 |
(24,83Vo)
Chapter 7
With Defined Functions
-
0) (h
a
q)
I
tr
rl.
-
q)
7-
+)
(t)
-
I
.-
.-
-
al
I
lrl
-. 1
a
a
q)
I
I
-
-
(r)
qH
+J
-
.T
-
cg
A
lr
A
fli
(8'3vo)
Genlration
Figure 7.3 Performance curves for the even-S-parity problem showing that E*ir1, = 384,000
withADFs having an argument map of {3}.
With Defined Functions
s.
^ 1 8,000,000
. G
u) (n
q)
I
(J
i
a
CH
€.Fl
-
. I
-,-
A
-
h
A
-l
(50,767o)
4,000,000
(3,l.3Vo)
Eigwe7.4 Performance curves for the even-S-parity problem showing that E*i,y = 592,000
withADFs having an argument map of {4}.
-
e
a
a
q)
9
tr
A .
-
q)
-
€
a
-
-
-
)
-
O I
.-
-
a
I
-
23 E = 384,000
l- P,M'D I
| .- I(M, i, z) |
I M = 4oool
I z=99%o I
I R(z)=+ |
I N=35 |
Determining the Architecture of the Program
With Defined Functions
(50,80Vo)
3,000,000
(6,4Vo)
Generation
Figure 7.5 Performance curves for the even-S-parity problem showing that E*irp = 272,N0
withADFs having an argument map of {2,21.
7.4.5 TWo TWo-Argument ADFs
Figure 7.5 presents the performance curves based on 55 runs of the evenS-parity problem with two two-argument automatically defined functions,
showing that it is sufficient to process272,000 individuals to yield a solution
with 99% probability.
7.4.6 TWo Three'Argument ADFs
Figure 7.6 presents the performance curyes based on 93 runs of the evenS-parity problem with two three-argument automatically defined functions,
showing that it is sufficient to process E*rtn = 400,000 individuals to yield a
solution with 99"h probability. Figure E.L reports on an additional 32 runs
made for this problem with the computer code shown in that appendix; the
computational effort, Er6, ,rrredsured by means of those 32 runs is also 400,000.
7.4.7 TWo Four-Argument ADFs
Figure 7.7 presents the performance curves based on 43 runs of the evenS-parity problem with two four-argument automatically defined functions,
showing that it is sufficient to process 656,000 individuals to yield a solution
with 99% probability.
7.4.8 Three TWo-Argument ADFs
Figure 7.8 presents the performance curyes based onIIT mns of the evenS-parity problem with three two-argument automatically defined functions,
Eq)
t )
o
c)
I
L
A .
-
0)
-
+)
(A
-
G-
)
I
v
.-
.-
E/
I
-
16 E=272,000
I- p,MD I
l+ I(M' i' z)l
MI
I R(z)=+ |
I N=55 |
208 Chapter 7
€q)
a
CN
o
I
li
A . -t
q)
a\ -
+a
(n
R
-
)
-
. !
.-€al E
l-t
3,000,000
\
(50,84Vo)
1,500,000
-. 1
*a
0
a
o
I
(J
t
a
CH
+) .-
-
./
A
-
CB
-.
o
t{
A
-
With Defined Functions
(8'7'5vo) Generation
Figure 7.5 Performance curyes for the even-S-parity problem showing that Er6 = 400,000
withADFs having an argument map of {3,3}.
With Defined Functions
p)+.tqd
c.rr2.srution
Figwe7.7 Performance curyes for the even-S-parity problem showing that E*r1 = 556,000
with ADFs having an argument map of t14, 4]..
-
q)
0
(n
q)
9
tr
A
tq
q)
-,
+a
ct)
1)
d
o!l
.-
EA
I
F{
-rl
(n
a
()
I
I
-'-
a
C|l
€
-
. I
-.ct
A
-
L
A .
l-l
l- PMJ) I
I r- t(u, i, z) |
fl,r-- 4Joo I
I z=99% |
I R1z;=+ |
,l N=93 |
i\
i \
lu {z+,awo)
I
P(M,i)
a- I(M, i, z)
M = 4,000
z=997o
R(z) - 4
N =43
2W Determining the Architecture of the Program
I With Defined Functions
(50,79Vo)
2,500,000
(5,25Vo) 25
Generation
Figure 7.8 Performance curves for the even-S-parity problem showing that E.;rp = 380,000
with ADFs having an argurnent map o112,2,2\.
showing that it is sufficient to process 380,000 individuals to yield a solution
with 99% probability.
7.4.9 Three Three-Argument ADFs
Figure 7.9 presents the performance curves based on 36 runs of the evenS-parity problem with three three-argument automatically defined functions,
showing that it is sufficient to process 272,000 individuals to yield a solution
with 99%probability.
7.4.10 Three Four-Argument ADFs
Figure 7.10 presents the perfonnance curves based on 37 runs of the evenS-parity problem with three four-argument automatically defined functions,
showing that it is sufficient to process 672,000 individuals to yield a solution
with 99"hprobability.
7.4.11 Four Two-Argument ADFs
Figure 7.11 presents the performance curves based on 58 nrns of the evenS-parity problem with four two-argument automatically defined functions,
showing that it is sufficient to process 360,000 individuals to yield a solution
with 99% probability.
ChapterT
5,000,000
XY
a
a
()
I
I
-
a
tsso
>> .lJ
-
.!l
-
GFl -
tr
ll.
-
'tt
q)
V)
a
q)
I
l.r
A -
-
q)
-
*a
a
-
)
-
.!l
.I| 'tt
A
E
l-{
21.0
18,6lVo)
With Defined Functions
(6,3Vo) Generation
Figure 7.9 Performance curves for the even-S-parity problem showing that Errry = 272,000
withADFs having an argument map of {3,3,3}.
With Defined Functions
4,000,000
\
(50,76Vo)
2,000,000
(9'5vo) Generation
Figure 7.10 Performance curves for the even-S-parity problem showing that E*r, = 672,000
with ADFs having an argument map of 14,4,41.
E
o) (a
(t)
q)
I
t i A .
-t
c)
-
+.
u)
d
FI
-
;
I
f-l
s
(t)
(t)
o
I
I
t
a
eH
*)
-
A
-
L
|l.
-
FFI
q)
a
a
q)
I
L
A ,
-
()
+)
0
!a
J
ft
. E
. l
tl
(l
I
I
N
(n
a
€)
I
I
t
a
€Fi
>-)
+a
-
.-
-
E
L
A . -
16 E=272,000
l- P,Mt 1
l+ I(M, i. z)l
I M = 4oool
I z=99%o I
I R(z)={ I
I N=36 |
27 E= 672,000
211, Determining the Architecture of the Program
^ 100
(t)
0
O
I
I
-
)
0
tsso
{r) .Fl
-
a -
Ga\ -
li
A -
-
0
With Defined Functions
(50,81Vo)
4,000,000
(5,2Vo) 25
Generation
Figure 7.1-1 Performance curves for the even-S-parity problem showing fhat Eri,7 = 360,000
with ADFs having an argument map of {2, 2, 2, 21.
7.4.L2 Four Three-Argument ADFs
Figure 7.l2presents the performance curves based on 40 runs of the evenS-parity problem with four three-argument automatically defined functions,
showing that it is sufficient to process 420,000 individuals to yield a solution
with 99% probability.
7.4.13 Four Four-Argument ADFs
Figure 7.13 presents the performance curves based on M runs of the evenS-parity problem with four four-argument automatically defined functions,
showing that it is sufficient to process 912,000 individuals to yield a solution
with 99% probability.
7.4.14 Five TWo-Argument ADFs
Figure 7.L4 presents the performance curves based on 67 fl.rns of the evenS-parity problem with five two-argument automatically defined functions,
showing that it is sufficient to process 360,000 individuals to yield a solution
with 99% probability.
7.4.15 Five Three-Argument ADFs
Figure 7.L5 presents the performance curves based on 63 runs of the even'
S-parity problem with five three-argument automatically defined functions,
showing that it is sufficient to process 512,000 individuals to yield a solution
with 99% probability.
8.000.000Eq)
u) ra
c)
I
li A
FI()
-
*)
(n
-
CB
rl
-
a -
.-
-
v
-l
I
-l
Chapter 7
With Defined Functions
-
a
a
0
q)
I
L
A .
-
€)
-,
+)
0
-
6l
F{
-
-' I
.-
-
F4
t-
. G
a
a
q)
I
I
-
-
a
tsso
*)
. l
.-
A
-
L
A .
-t
\25
6,2.57o) Generation
Figure 7.12 Performance curves for the even-S-parity problem showing that E*u, = 420,000
withADFs having an argument map of {3,3,3,31.
With Defined Functions
10,000,0005
q) u)
a
{)
(50,61Vo) t
/&
q)
A
-,
5.000.000
!Ta
a
-
cE_'-
€t -
'-
rl
v
FI0
(2,ZVo) 25
Generation
Figure 7.L3 Performance curves for the even-S-parity problem showing that E.;r7, =912,000
withADFs having an argumentmap of {4,4,4,4]l.
a
(n
q)
CJ
9
:l-
(n
C-r
A ''tll
>> +) .-
-
.-
-,
A
-
k
A .
H
37 E=912.000
213 Determining the Architecture of the Program
000 E
0) (n
a
q)
I
L
A ,
-
--' 1 2,000,
(n
CNq)
I
I
t
-
0
CH
>> +)
tr
A ,
H
With Defined Functions
25
Generation
(50.70Vo\ . A o
-
1,000,000 s
v)
-
d
F.
t
-
Y
. E
. E
E
v
/
-
(6,I.SVo)
,-a I
a
a
q,)
I
I
-
-
a
+r
>>
+a .-
-
A
L
A .
-
FigureT||,4 Performance curves for the even-S-parity problem showing that E*;r1, = 360,000
with ADFs having an argument map of {2, 2, 2, 2, 2|..
With Defined Functions
(50,84Vo)
1.000.000
(6,27o) Generation
Figure 7.15 Performance curves for the even-S-parity problem showing that E*irp = 512,000
with ADFs having an argument map of {3,3,3,3,31.
2,000,000-
c) (n
ct2 q)
9
L
A .
H
()
+)
U)
,l
-
-
.-
.-
;lU
-
x
214 Chapter 7
I
. G
L\-
a
U2 q)
CJ
I
t
a
CH
+. .-
-
.-
-
d
-
t.r
A
-
With Defined Functions
(50,767o)
2,500,000
(lo'Zvo) Generation
Figure 7.15 Performance curves for the even-S-parity problem showing that Er;rp = 736,000
with ADFs having an argument map of {4,4,4,4,41.
7.4.16 Five Four-Argument ADFs
Figure 7.16 presents the performance curves based on 42 runs of the evenS-parity problem with five four-argument automatically defined functions,
showing that it is sufficient to process 736,000 individuals to yield a solution
with 99%prcbabilify.
7.5 SUMMARY OF RETROSPECTIVE ANALYSIS
The results in the previous L5 sub-subsections show that genetic programming is capable of solving the even-S-parity problem with all 15 combinations of architectures with a population size of 4,000.
Thble 7.1 consolidates the results of the runs with these L5 combinations of
choices of different numbers of defined functions (the first column) and different numbers of arguments that they each possess (the second column).
The third column shows the computational effort, E*ith,required. E.,,0 isthe
minimal value of I(M,i, z) ffidis realized at generation l* (shown in the fourth
column). The probability of success at generation i* is P(M,1" ) (shown in the
fifth column). The number of independent runs required is R(a) (shovrn in
the sixth column). The probability of success at generation 50, P(M,50), is
shown in the seventh column.
As previously mentioned in subsection 7.4.1,, the baseline value for
computational effort, Ewithout, required for the even-S-parrty problem without automatically defined functions and with a population size of 4,000
ts L,632,000.
When automatically defined functions are used with any of the 1-5
architectures, E*i,1, always proves to be considerably less than this value of
5.000.000frt
(D
a
a
q)
I
L
A .
-
0)
I
(h
-
d
rl
.-
-
d
-
275 Determining the Architecture of the Program
Thble 7.1 Consolidated table of the computational efforf Ewith,and other statistics
for 15 different architectures for the even-S-parity problem.
Number
of ADFs
Number
of arguments
Ewrth i* P(M,i. ) R(e) P(M,50)
1
t
1
2
2
2
J
3
3
4
4
4
5
5
5
2
3
4
2
3
4
2
3
4
2
3
4
2
3
4
300000
384,000
592,000
272,000
400,000
656,000
380,000
272,000
672,000
360,000
420,000
912,000
360,000
512,000
736,000
24
23
36
16
24
40
18
16
27
34
34
37
17
31
45
83%
71%
69o/"
69%
69%
70%
61%
69%
54%
62%
80o/"
55%
6I%
70%
69%
o
J
4
4
4
4
4
5
4
6
5
J
6
5
4
4
90%
77%
76%
80%
u%
77%
79%
94%
76%
8l%
85%
61%
70%
84%
76%
7'632,000 for E.urou,. Specifically, E.,,0 varies from a low of 272,000 (17%
of 1,632,000) to a high of 91,2,000 (56% of I,632,000).
The conclusion is that the decision to use automatically definedfunctionsis far
more important than the decision to use a particular architecfure for the automatically defined functions.
Table 7.2 presents the values of computational effort, E*ith,from table7.1,
as a two-dimensional table. E.,r, attatns its minimum value of 272,000for the
L5 architectural choices when there are two two-argument defined functions
and three three-argument defined functions.
Figure 7.17 presents the computational effort, Ewrth, from table 7.2for each
of the 15 combinations of choices of the number of automatically defined
functions and the number of arguments that they each possess. The global
minimum value of E*r, in this table is272,000; it is reatzed for both two twoargument automatically defined functions and three three-argument automatically defined functions
The last row of table 7.2 shows that the computational effort is distinctly
higher for this problem when the defined functions have four arguments.
The last column of the table shows that the most computational effort is
required for five automatically defined functions when they each possess three
or four arguments. This problem is most readily solved when both the number of automatically defined functions and the number of arguments is three
orless.Additional arguments and additional automaticallydefined functions
are excessive, in retrospect, for this problem.
216 Chapter 7
Table 7.2 Computational effort, Ewith, for 15 different architectures for the evenS-parity problem with ADFs.
ffi 1 2
a
J 4 5
2
3
4
300,000
384,000
592,000
272,000
400,000
656,000
380,000
272,000
672,000
360,000
420,000
912,000
360,000
512,000
736,000
1.000.000
Number of
arguments
Figwe7.l7 Computational effort, E*ith, for 15 different architectures for the even-S-parity
problem with ADFs.
Flowever, the key result of these experiments is that this problem is solved
for aII15 architectural choices with automatically defined functions. Moreove{, aII15 architectural choices are superior to the case when automatically
defined functions are not used. The architectural choice can merely affect the
the number of fitness evaluations by a factor of up to 3.4:1.
The 15 architectures that we have just examined are uniform in the sense
that they do not include ardritectures in which the automatically defined functions within a program possess different number of arguments. There are 3k
different ways of assigning a number of arguments (between two and four)
to k hierarchical automatically defined functions. Thus, there are 360 different
architecfures when the number of arguments is between two and four and
when k is between 2 and 5. The 345 nonuniform architectures are arguably
subsumed, in one sense, by one of the 15 uniform architecfures examined
above because automatically defined functions are capable of selectively
ignoring their dummy variables. Howevel, they do present genetic Programming with a different working environment.
Number of ADFs
Determining the Architecture of the Program
FI
v
qJ
(h(n
q)
I
L
A
F-{
q)
Al -
+r
a
-
cg
i
)
Fl
o l
.-
-
v
Fi
I
f-
^ 100 6,000,000
(n
a
q)
I
I
t
-
a
tsso
+)
. I
-
. I
A
rI
-,
t l A .
-
0
With Defined Functions
2l E = 352,000
r
i\
i et,69qo)
(8,67o) 25
Generation
50
Figure 7.18 Performance curves for the even-S-parity problem showing that E*iry = 352,000
with ADFs having an argument map of 12, 31.
It is obviously impractical to test alt 345 of these additional architectures;
howeveq, we tested a few such architectures out of curiosity. For example,
figure 7.L8 presents the performance curves based on 36 runs of the evenS-parity problem when the argument map for the automatically defined
functions is {2,3}. This figure shows that it is sufficient to process 352,000 individuals to yield a solution with 99% probability.
The value of E*,,, of 352,000 is intermediate between the 272,000 fitress
evaluations required for two two-argument automatically defined functions
and the 400,000 fitness evaluations required for two three-argument automatically defined functions.
Table 7.3 shows the efficiencyratto, RB,for the L5 combinations of thenumber of automatically defined functions and the number of arguments for the
even-S-parity problem. Each entry in this table is obtained by dividing the
corresponditg entry from table 7.Zby 1,632,000, the baseline computational
effort, Ewithout rwithout automatically defined functions (subsection 7.4.1). All
L5 efficiency ratios are above 1, indicating that automatically defined functions are beneficial. The largest efficiency ratio of 6.00 is achieved for the two
architectures for which E*u, is272,000. The lowest two efficiency ratios (the
2.22nthe lower right comer and the 1.79 near that comer) are obtained when
an apparently excessive number (4) of arguments is used in conjunction with
an apparently excessive number (4 or 5) of automatically defined functions.
The additional overhead associated with these two excessive architectures
apparently counterbalances the advantages of using automatically defined
functions on this problem.
The data in table 7.3 for the 15 different architectures for the even-S-parity
problem, the data in table 6.10 concerning the even-3-, 4-,5-, and 6-parity
Chapter 7
(50,87Vo)
3,000,000
218
Table 7.3 Efficiency ratios,
problem.
Rs, for 15 different architectures for the even-S-parity
7 2
a
J 4 5
2
3
4
5.M
4.25
2.76
6.00
4.08
2.49
4.29
6.00
2.43
4.53
3.88
1.79
4.53
3.18
2.22
problems, the data in table 5.20 for the scaled-up versions of the four problems from chapter 5, the fact that it is possible to solve the even-Parity problems of orders 6,7 ,8,9,I},and 11 with automatically defined functions (section
6.I6),and the data in numerous additional tables that will appear later in this
book all provide evidence to support main point 3 of this book:
Main point 3: For a variety of problems, genetic programming requires less
computational effort to solve a problem with automatically defined functions
than without them, provided the difficulty of the problem is above a certain
relatively low problem-specific breakeven point for computational effort.
This conclusion accurately reflects the cumulative evidence in this book
over a range of problems from different fields. Like the other main points of
this book, it is not stated as a theorem; no mathematical proof is offered.
There are no exceptions to this conclusion anywhere in this book or in any
runs of any other problems of which I am aware. Exceptions to this conclusion will almost certainly be uncovered as automatically defined functions
are studied further. These probable future exceptions should then lead, over
time, to refinement, modification, and qualification of this conclusion conceming the effect on automated problem-solving of regularities, symmetries,
and homogeneities in problem environments.
The above conclusion is, of course, already qualified in the sense that it
incorporates the imprecisely defined concept of breakeven point. The simple
and scaled-rp versions of the four problems from chapter 5 strongly suggest
that there are problems with with sufficient modulariV to benefit from hierarchical decomposition and that there are problems whose modularity is so
meagre that they do not so benefit. Howeveq, I do not claim to define precisely
the nature of this separation or its exact location in the space of problems.
Nonetheless, the validity of experimentally obtained evidence is not negated
by the absence of mathematical proofs or complete explanations of observed
phenomena. Indeed, most science (unlike almost all "computer science") proceeds without airtight proofs. First, questions are raised. Then, experiments
are conducted to accumulate evidence. Next, explanations that encapsulate
the observed evidence are formalized. Additional experiments are then conducted, usually with the result that the current hypothesis must be refined,
modified, or qualified. Finally, at some point, a unifying theory emerges.
Wenow consider the average strucfuralcomplexityof the solutions evolved
by genetic prograruning with automatically defined functions.
Determining the Architecture of the Program
Table 7 .4 Average strucfural complexity, S with ,of the solutions to the even-s-Parity
problem for 15 different architectures withADFs.
1 2 3 4 5
2
3
4
82.5
1L9.4
166.0
99.5
152.6
225.8
119.3
176.0
27L.0
131.0
217.1,
391.5
L49.5
248.5
436.6
Table 7.4 presents the average structural complexity, Swithr for each of the
15 combinations of choices of differentnumbers of automatically defined functions with different numbers of arguments of solutions to the even-S-parity
problem with automatically defined functions.
Table 7.5 presents the structural complexity ratio , Rs,for each of the L5
architectures. Each entry in this table is obtainedby dividing the corresponding entry from table 7.4by 299.89,the average structural complexit!, Swthout,
of solutions to the even-S-parity problem without automatically defined functions (subsection7.4.I). Except for two of these 15 architectures, these ratios
are greater than 1 (indicating that the average structural complexi$,\ritn, of
the solutions is less when automatically defined functions are being used).
The two exceptions occur when there are four or five four-argument automatically defined functions. One explanation for the two exceptions is that
they employ an excessive number (four or five) of automatically defined functions and an excessive number (four) of arguments (for a problem with only
five independent variables).
The following 11 exceptions prevent making an unqualified statement that
automatically defined functions improve the parsimony of the solutions
evolved by genetic progamming:
(1) the two-boxes problem (figure 4.20),
(2) the simpler versions of the four problems in chapter 5 (the first four rows
of table 5.20),
(3) the scaled-up versions of two of the four problems in chapter 5 (specifically,the sextic polynomial x6 -2xa + x2 arrdthe three-term expression
x I n + *2 / nz + 2nx,as shown in the fifth and elghth rows of table 5.20),
(4) the even-3-parity problem (figure 6.II),
(5) the two architectural choices (out of 15) for the even-S-parity problem
(table 7.5), and
(6) the subset-creating version of the transmernbrane problem (table 18.13).
Eleven exceptions may seem so excessive as to brirg the entire proposition
into question. Howevel a closer examination indicates that eight of these LL
exceptions relate to very simple problems. The first of the eight exceptions
relates to the two-boxes problem. Four relate to the simpler versions of the
four problems in chapter 5. TWo additional exceptions (the sextic polynomial
?20 Chapter 7
Table 7.5 Structural complexity ratios, R5, for 15 different architectures for the evenS-parity problem.
and the three-term expression) are both "scaled-up" versions of their respective pair of problems from chapter 5; howeveq, both of these "scaled-up" vetsions are, in fact, still relatively simple problems. The even-3-parity problem
is also the simplest problem in the progression of parity problems. Moreover,
the offending ratios for these three last problems (0.98 ,0.92, and0.92, respectively) are all close to L.
The simplicity of these first eight exceptions suggests the existence of a
breakeven point for average structural complexity. That is, eight of the 11
exceptions can reasonablybe explained because the problems lie on the wrong
side of a breakeven point for average strucfural complexlty.
The three other exceptions involve the subset-creating version of the transmembrane problem (where the "aver age" structural complexity it', table 18.13
comes from only one successful run) and the two extreme architecfures for
the even-S-parity problem. The first of these three exceptions may, of course,
be a matter of inadequate measurement. There is not sufficient evidence to
support any particular explanation for exceptions relating to the two extreme
architectural choices (out of L5) for the even-S-parity problem. It maybe that
an architecture can be so excessive and mismatched to the problem at hand as
to outweigh the potential advantages of automatically defined functions. This
possible explanation suggests future experimentation over a additional types
of problems.
hr spite of the absence of sufficient evidence to adopt any particular explanation for the two exceptions, the evidence does support a conclusion that is
true most of the time: Automatically defined functions do improve the parsimony of the solutions evolved by genetic prograrnming provided the difficulty of the problem is above a breakeven point for average structural
complexity. That is, genetic programming usually (but not always) yields solutions that have smaller average overall size with automatically defined functions than without them. This qualified conclusion is stated as main point 4:
Main point 4: For a variety of problems, genetic programming usually
yields solutions with smaller overall size (lower average structural complexity) with automatically defined functions than without them, provided the
dfficulty of the problem is above a certain problem-specific breakeven point
for average strucfural complexity.
Main point 4 was an unanticipated product of our experiments on automatically defined functions. Before starting these experiments, I expected
Determining the Architecture of the Program
1 2
o
J 4 5
2
J
4
3.&
2.51,.
1.81
3.01
1,.97
t.32
2.5r
1.70
1.11
2.29
1.38
0.77
2.01,
1.21
0.69
?21
automatically defined functions to reduce computational effort in some way;
howevel, I did not expect any improvement in parsimony. [r retrospect, an
improvement in parsimony from automatically defined functions seems very
reasonable since decomposing problems into subproblems and reusing the
solutions to subproblems should reduce the total size of the program.
Whenever I give a talk on genetic programming, someone always asks how
the genetically evolved prograrns can be made smaller and simpler. Holding
aside my general concem that forcing pro$ams to be parsimonious may be
counterproductive in the overall effort to get computers to program themselves withoutbeing explicitlyprogranuned,I have previously given the following three answers:
First, the population of programs can be simplified during a run by means
of the editing operation (Genetic Programming, subsection 6.s.3).
Second, the genetically evolved best-of-run program can be simptified after
it is produced by genetic prograrnmi^g ir, a post-run process (by means of the
editing techniques describe d tn Genetic Programming, appendix F).
Third, parsimony can be made part of the fiffress measure (see the blockstacking problem tn Genetic Programming, subsection 18.1.3). Howeve{, the
overt incorporation of parsimony into the fifiress measure raises significant
practical and theoretical issues. The practical issue concems finding a principled way to choose the relative shares for these two competing factors in the
fitness calculation and the gradient to be used in allocating these shares. A
blended fibress measu-re trades off a certain amount of correctness for a certain amount of parsimony (Koza Ig7z).Should the relative share of the blended
measure be based on a percentage, an additive formula, or some other formula? II,say, a percentage is chosery should one allocateSo/o,10o/o,25o/o,33o/o,
or some other percentage to parsimony as opposed to correcbress? Even more
vexing, how does one apportion the allocated percentages over imperfect programs with lesser or greater degrees of parsimony or correcbress. It is not at
all clear how to do this in a principled way over a wide rarnge of problems.
For the particular (and important) case of symbolic regression (system-identification) problems, the minimurn description length (MDL) principle has
been suggested as a way to make this tradeoff in a principled way with a
minimum of adhoc choices (Iba, Kurita, de Garis, and Sato r993;Iba, de Garis,
and Sato L993,I994).Howeveq, unparsimonious strucfures play a unique and
important role in genetic programming.Many of the seemingly extraneous
parts of genetically evolved programs apparently serve as reservoirs of genetic
material; premature efforts to simplify programs may deny the population
the needed diversity of genetic material with which to fashion the ultimate
solution to the problem. This concem can be partially addressed by deferring
the blending of parsimony into the fibress measure until relatively late in the
run (e.g., after attainment of some reasonably high level of fitness using the
original fitness measure) or until at least one or a certain number of solutions
(or satisfactory results) is found using the original fibress measure.
The results of the experimental research reportedherein, as reflected inmain
point 4, indicate that there is a fourth way to achieve parsimony in genetic
222 Chapter 7
programming: automatically defined functions. Parsimony aPpears to be an
emergent property of most runs of genetic programming with automatically
defined functions. The advantage of achieving parsimony by means of automatically defined functions is that this approach does not require any Predefined arbitrary tradeoff between the competing elements of the fitness
measure and doesnotappear tobe limited to one particular class of problems
(e.g., symbolic regression problems).
223 Determining the Architecture of the Program
The Lawnmower Problem
The progression of parity problems in chapters 6 and 7 provide evidence in
favor of the proposition that automatically defined functions are beneficial in
terms of both computational effort and parsimony; howevel, they are constraining because they are so time-consuming. The lawnmower problem discussed in this chapter is an especially-constructed, fast-running problem
designed to provide a flexible testbed for studying automatically defined functions. The lawnmower problem was specifically designed with the expectation that it would
. be much faster to run than the parity problem (it yields solutions with a
population size of 1,000, rather than 16,000 or 4,000),
. be hard enough that its problem environment has exploitable regularities,
. be hard enough to have interesting hierarchical decompositions,
. have a sufficient$ rich and varied function set to enable the problem to be
solved in many distinctly different ways,
. be on the beneficial side of the breakeven point for computational effort,
. be on the beneficial side of the breakeven point for average structural
complexity,
. be scalable with a much finer granularity than merely the number of
arguments (3,4,5, and 6) to the Parity function, and
. be so much faster to solve that we can say, in spite of all of the difficulties
and uncertainties inherent in measuring wallclock time, that this problem
is clear$ on the beneficial side of the breakeven point for wallclock time
when automatically defined functions are used.
hr addition to the above characteristics, the lawnmower problem illustrates
another interesting aspect of hierarchical computer Programming. In the foregoing chapters, information was transmitted to the genetically evolved reusable subprograms solely by means of explicit arguments. The automatically
defined functions were usually repeatedly used with different instantiations
of these explicit arguments. When the transmitted values are received by the
automatically defined functiory they are bound to dummy variables (formal
parameters) that appear locally inside the function. An alternative to the
explicit transmission of information to a subprogram is the implicit transmission of information by means of side effects on the state of the system. Lr the
lawnmowerproblem considered inthis chapteq, one of the two automatically
defined functions takes no explicit arguments at all.
8.1 THE PROBLEM
In the lawnmower problem, the goal is to find a program for controlling the
movement of a lawnmower so that the lawnmower cuts all the grass in the
yard. The desired program is to be executed exact$ once, so the program
must contain within itself all the operations needed to solve the problem. The
lawnmower problem is scaled in terms of the size of the lawn.
We first consider a version of this problem in which the lawnmower operates in a discrete 8-by-8 toroidal square area of lawn that initially has grass in
aIl64 squares. Later we will scale the lawn down to 48 and 32 squares and
scale it up to 80 and 96 squares and compare the results to the results obtained
for the 64-square lawn.
Each square of the lawn is uniquely identified by a vector of integers modulo
8 of the form (ij), where 0 ( r, j <7. The lawnmower starts at location (41)
facing north. Note that we use the usual mathematical notatiory (4,4),to denote
a vector of numbers (rather than the style of LISP). The state of the lawnmower
consists of its location on one of the 64 squares of the lawn and the direction
in which it is facing. The lawn is toroidal in all four directions, so that whenever the lawnmower moves off the edge of the lawn, it reappears on the
opposite side.
The lawnmower is capable of turning left. It can also move forward one
square in the direction in which it is currently facing. Being a somewhat magical
lawnmower, it can jump by a specified displacement in the vertical and horizontal directions in the plane. Whenever the lawnmower moves onto a new
square (either by means of a single move or a jump), it mows all the grass, if
any/ remaining in the single square on which it arrives. The lawnmower has
no sensors.
Figure 8.L shows the 64 squares of the layrn. The origin (0,0) is in the upper
left corner. The numbering of the squares increases going dorryn and going to
the right. There are no obstacles in the yard.
A human Programmer writing a program to solve this problem would
almost certainly not solve it by tediously writing a sequence of 64 separate
mowing operations (and appropriate tuming actions). Instead, a human progranuner would exploit the considerable homogeneity and symmetry of this
problem environment by writi.g a program that mows a certain small area of
the lawn in a particular wa,fr then repositions the lawnmower in some reguIat way, and then repeats the particular mowing action on the new area of the
lawn. That is, the human programmer would decompose the overall problem into a subproblem (i.e., mowing a small area), solve the subproblem, and
then repeatedly use the subproblem solution at different places on the lawn
in order to solve the overall problem.
Chapter 8
(7,0)
t
I
Start
Figure 8.1 Starting location and orientation of the
(7,7)
lawnmower in a lawn with 64 squares.
8.2 PREPARATORY STEPS WITHOUT ADFs
The operations of turning left and mowing one squane take no arguments.
Each of these operations changes the state of the lawnmower. It is largely a
matter of convention as to whether such zero-argument side-effecting functions are treated as members of the function set or as members of the terminal
set. For pqposes of exposition, we adopt the convention throughout this book
of treating zero-argument side-effecting functions as terminals, but treating
zero-argurnent ADFs as functions. For puryoses of programming, we treat all
zero-arpment functions as terminals (appendix E).
Since it may be desirable to be able to manipulate the numerical location of
the lawnmower using arithmetic operations, random constants should be
available as ingredients of programs for solving this problem. The random
constants, frvg, appropriate for this problem are vectors (i, j ) composed of
two integers modulo 8. These vector random constants range over the 64
possibilitiesbetween (0,0) and (1 ,7).
Thus, the terminal set for thisproblem consists of two zero-argument sideeffecting operators and random vector constants.
'T= { (LEFT), (MOW), Sv8}.
The operator lnrt takes no arguments and tums the orientation of the
lawnmower counterclockurise by 90" (without moving the lawnmower). Since
the programs will be performing arithmetic operations, it is necessary that all
terminals and functions retum a value that can serve as a legitimate argument to the arithmetic operations. Thus, to ensure closure, LEFT returns the
vectorvalue (0,0).
fhe operator MOW takes no arguments and moves the lawnmower in the
direction it is currently facing and mows the grass, rf arty, in the square to
which it is moving (thereby removing all the grass, rf arty, from that square).
MoW does not change the orientation of the lawnmower. For example, if the
lawnmower is at location (L,3) and facing east, MoW increments the first
227 The Lawnmower Problem
(0,7)
component (i.e., the r location) of the state vector of the lawnmower thus
moving the lawnmower to location (2,3) with the lawnmower still facing east.
As a further example, if the lawnmower is at location (7,3) and facing east,
Mowmoves the lawnmowerto location (0,3) because of the toroidal geometry.
To ensure closure, trlOW also retums the vector value (0, 0 ) .
The function set consists of
F- {VBA, FROG, PROGN}
with an argurnent map of
{2,2,ll.
VBA is two-argument vector addition function modulo 8. For example,
(VBA (I,2) (3,7)) retumsthevalue (4,I).
FROG is a one-argument operator that causes the lawnmower to move
relative to the direction it is currently facing by an amount specified by its
vector argument and to mow the grass, if any, in the square on which the
lawnmower arrives (thereby removing all the grass, if any,from that square).
FROG does not change the orientation of the lawnmower. For example, if
the lawnmower is at location (1,2) and is facing east, (FRoc (5 , 3 ) ) causes
the lawnmower to end up at location (6,5) with the lawnmower still facing
east. The grass, if any, at the location (6,5) is mowed. FRoc acts as the identity operator on its argument so, for example, (FROG ( 5 , 3 ) ) refurns the
value (5,3).
The solution to this problem does no! of course, require both the MoW and
FROG operators. The function set was intentionally enriched by the inclusion
of both operators so there would be many altemative approaches to solving
the problem and to permit solutions cornbining local activity (using uow) with
nonlocal activity (using FROG).
The goal is to mow aIl64 squares of grass with a single execution of the
Program. The movement of the lawnmower is terminated when either the
lawnmower has executed a total of 100 LEFT turns or a total of 100 movement-causing operations (i.e., Mows or FRoGs). The raw fihress of a particular
Program is the amount of grass (from 0 to 64) mowed within this allowed
amount of time. Since the yard contains no obstacles and the toroidal topology of the yard is perfectly homogeneous and symmetrical, it is only necessary to measure fihress over one fihress case for this problem.
A population size of only 1,,000 appears to be satisfacto ry for this problem.
Thble 8.1 summarizes the key features of the 64-square lawnmower problem in an unobstructed yard without automaticallv defined functions.
8.3 LAWN SIZE OF 64 WITHOUT ADFs
The only way to write a computer program to mow all,64squares of the lawn
with the available movement-causing and tuming operators involves tediously
writing a Program consisting of at least 64 uows or FRoGs so that all64squares
of the lawn are mowed. One possible orderly way of writing this tedious
228 Chapter 8
Thble 8.L Thbleau without ADFs for the 6&square lawnmower problem.
Objective: Find a program to control a lawnmower so that it
mows the grass on all 64 squares of lawn in an
unobstructed toroidal yard.
Terminal set
without ADFs:
(LEFT ) ,
(MoW)
, and the random constants 9ts.
Function set
withoutADFs:
VBA, FROG, ANd PROGN.
Fitness cases: One fitress case consisting of a toroidal lawnwith 64
squares, each initially containing grass.
Raw fibiress: Raw fitness is the amount of grass (from 0 to 64)
mowed within the maximum allowed number of
state-changing operations.
Standardized fitress: Standardized fitness is the total number of squares
(i.e.,64) minus raw fibress.
Hits: Same as raw fitness.
Wrapper: None.
Parameters: M =1,000. G=51..
Success predicate: Aprogram scores the maximum number of hits.
program involves mowing all eight squares of lawn in the vertical column
beginning at the starting location (4A), tuming left upon returning to (4,4)'
moving one square to the west, fuming left three times so as to face north
again, and then mowing the remaining seven squares of lawn in the new
vertical column. This process can thenbe continued for the remaining columns.
The f ollowing handwritten L 0O%-correct 1 00-point program implements
the above approach using the ordinary PROGN LISP connective that is caPable
of taking an indefinite number of arguments:
(pRoGN (MOW) (Mow) (Mow) (Mow) (Mow) (Mow) (Mow) (Mow)
(LEFT) (MOW) (LEFT) (LEFT) (LEFT)
(MOW) (MOW) (Mow) (Mow) (MOW) (MOW) (MOW) (MOW)
(LEFT) (MOW) (LEFT) (LEFT) (LEFT)
(Mow) (Mow) (Mow) (Mow) (Mow) (Mow) (Mow) (Mow)
(LEFT) (MOW) (LEFT) (LEFT) (LEFT)
(MOW) (Mow) (Mow) (Mow) (Mow) (Mow) (Mow) (Mow)
(LEFT) (MOW) (LEFT) (LEFT) (LEFT)
(MOW) (MOW) (MOW) (MOW) (MOW) (MOW) (MOW) (MOW)
(LEFT) (Mo\^/) (LEFT) (LEFT) (LEFT)
(MOW) (MOW) (MOW) (MOW) (MOW) (MOW) (MOW) (MOW)
(LEFT) (MOW) (LEFT) (LEFT) (LEFT)
(MOW) (MOVJ) (MOW) (MOW) (MOW) (MOW) (MOW) (MOW)
(LEFT) (MOW) (LEFT) (LEFT) (LEFT)
(MOW) (MOW) (MOW) (Mow) (MOW) (MOW) (MOW) (MOW) ) .
The Lawnmower Problem
If one uses only the two-argument PRocN that is actually available in the
function set specified above, the I97-point program below is an equivalent
implementationof the above. Thisprogramrequires ar:r average of 3.08points
for each of the 64 squares of lawn.
( (PROGN (PROGN (PROGN (PROGN (PROGN (PROGN (PROGN (MOW) (MOW) )
(PROGN (MOW) (MOW) ) )
(PROGN (PROGN (MOW) (MOW) )
(PRocN (PR.GN l#ffi lf:Hlo,'ffiI*i '1"",,,
,,
(PRocN (PRocN l:il:il liH5:l lffi;'li,l., ,
(PRocN (MOW) (Mow) ) )
(PROGN (PROGN (MOW) (MOW) )
(PRocN (PRocN l#ffi ll:51''ffi3*i '1,"",,,,
(PROGN (LEFT) (LEFT) ) ) ) )
(PROGN (PROGN (PROGN (PROGN (PROGN (MOW) (MOW)
(PROGN (MOW) (MOW)))
(PROGN (PROGN (MOW) (MOW)
(pRocN (Mow) (Mow) ) ) )
{PROGN (PROGN (LEFT) (PROGN (MOW) (LEFT) ) )
(PROGN (LEFr) (LEFr) ) ) )
/ pFor:NI 1 DP^.r\r / DDOGN (PROGN (MOW) (MOW) )
(PROGN (Mol^i) (MOW) ) )
(PROGN (pROGN (MOW) (MOW) )
(PROGN (MOW) (MOW))))
(pRocN (pRocN (LEFT) (PROGN (MOW) (LEFT) ) )
(PROGN (LEFT) (LEFT) ) ) ) ) )
(pRocN (pRocN (pRocN (PROGN (PROGN (pROGN (MOW) (MOW)
(PROGN (MOW) (MOW) ) )
(PROGN (PROGN (MOW) (MOW)
(PROGN (MOW) (MOW))))
(PROGN (PROGN (LEFT) (PROGN (MOW) (LEFT) ) )
(PROGN (LEFT) (LEFT) ) ) )
(PROGN (PROGN (PROGN (PROGN (MOW) (MOW) )
(PROGN (MOW) (MOW) ) )
(PROGN (PROGN (MOW) (MOW) )
(PRocN (pRocN l::ffi lH5:.o'ffi3*l '1"""',,,
(pRocN (pRocN (pRocN l:13:il [i:::l l;ffi;' iJ:*, ,
(PROGN (MOW) (MOW) ) )
(PROGN (PROGN (MOW) (MOW) )
(PRocN (pRocN l::ff| lfil:^.'ffi5,l,i '1""",,,,
(PRocN (PRocN lilS:il;ffil',J:#il' "'
(PRocN l:il:n lffi#l lil:#i i,
(PROGN (MOW) (MOW) )))))) .
As onewould expect, geneticprogramming is capable of solvingthisproblem without automatically defined functions.
As usual, the randomly created programs of generation 0 are not very
effective at mowing much of the lawn. For example, the best of generation 0
mows only eight squares of the lawn.
(pRoGN (PROGN (FROG (V8A (MOW) (1,7))) (FROG (FROG (7,4))l)
(PROGN (VBA (PROGN (LEFT) (MOW) ) (FROG (MOW) ) ) (PROGN (FROG
(Mow) ) (vga (4,0) (Mow) )))).
Fi gure 8.2 shows the traj ecto ry of tltjs 22-p oint b est-of- generation pro gram.
As can be seen, the lawnmower MoWs and pRocs around the lawn in its
230 Chapter 8
Figure8.2 8-scoringtrajectoryof thebestof generation0of onerunof the6&squarelawnmower
problem withoutADFs.
not-too-successful effort to mow all,64squares before the allowed amount of
time or the program is exleausted. In practice, this Program exhausts itself
before it runs out of time. The eight mowed squares are unshaded and the 55
urunowed squares are shaded in gray.
Subtrees of somewhat effective programs in this problem typically mow
small portions of the lawn. If two programs are selected from the population based on their fitness (i.e., such that more fit programs are more likely
to be selected than less fit programs), both of the selected Programs will
usually mow more lawn than a randomly selected program of their generation. Moreover, a randomly selected subtree from either of these selected individuals will, on average, mow more lawn than a randomly
selected subtree from a randomly selected program for its generation. Thus,
the effect of the crossover operation in this problem is to create new Programs which will, on average, mow an increasing and above-average
amount of lawn.
A genealogical audit trail can provide further insight into how genetic progtu*lt gworks.An audittrail consists of arecord of the ancestors of a given
individual and of each genetic operation that was performed on the ancestors in order to produce the individual. For the crossover oPeratiory the audit
trail includes theparticular crossoverpoints chosenwithin eachparent. \Mhen
automatically defined ftrnctions are involved, the crossover point can be in
eitherthebody of the result-producingbranch orin one of the function-definingbrandres.
The way thatthe crossover operation creates offspringPrograms thatmow
an increasing and above-average amount of lawn is illustrated by an examination of the genealogical audit trail for the best-of-generation Program of
the next generation (i.e., generation 1) of this run.
Parent A of the best of generation L is the 8th best program in the population for generation 0. This parent from generation 0 consists of the following
23-point program mowing seven squares of lawn:
731 The Lawnmower Problem
1_
z
4
5
(PROGN (PROGN (7,6)
(pRocN (vBA (MOW) (Mow) ) (vaa (LEFr) (MOW) ) ) )
(vBA (PROGN (MOW) (vBA (MOW) (LEFr) ) )
(PROGN (PROGN (7,5) (3.1))
(vBA (Mow) (Mow) )))).
This program contains seven MoW operations and achieves a raw fitness
of7.
Figure 8.3 shows the U-shaped 7-scoring trajectory of parent A from generation 0. The seven mowed squares are unshaded and the 57 unmowed
squares are shaded in gray. The lawnmower starts at the starting location
(4,4), mows north two squares, tums left, mows west three squares to the
square marked by the arrow labeled "1," andtums left (south).This activity
corresponds to the'first three lines of the five-line program above. The fifth
line of the program then causes the lawnmower to mow two additional squares
to the south.
Parent B of the best of generation 1 is the 3Lst best program of generation 0
and consists of the following l, -pontprogram mowing five squares of lawn
from generation 0:
(FROG (V8A (V8A (FROG (0,4)) (FROG (4,2)ll
(PROGN (PROGN (3,2) (MOW))
(v8A (MOw) (1,4))))).
This program contains two Mow and three FROG operations and scores a raw
fihress of 5.
Figure 8.4 shows the gyrating S-scoring trajectory of parent B from generation 0. The five mowed squares are unshaded and the 59 urunowed squares
are shaded in gray.
Note that neither parent A nor parent B is as good as the best of generation
0 (which mowed eight squares of lawn). Howeveq, both parents mow an aboveaverage amount of lawn as is usually the case whenparents are selected from
the population on the basis of their fituress.
The best of generation 1 of this run mows LL squares of lawn. This 33-point
program is
1
a
z
J
4
5
6
l
6
(PROGN (PROGN (7,6)
(pRocN (v8A (Mow) (Mow) ) (vea (LEFr) (Mow) )))
(vBA (pRocN (Mow) (vBA (Mow) (LEFr) ))
(PROGN (V8A (V8A (FROG (0,4))
(FROG (4,21) )
(PROGN (PROGN (3,21 (MOW))
(vBA (Molv) (1' 4) ) ) )
(v8A (Mow) (Mow) )))).
This eight-line offspring program contains nine Mow and two FRoc operations and scores a raw fitress of L1 because the lawnmower reaches 11 different squares with those LL operations. The crossover fragment contributed by
parent B is in boldface above. It is inserted into parent A in lieu of line 4 (the
underlined portion) of parent A shown earlier.
232 Chapter 8
Figure 8.3 U-shaped 7-scoring trajectory of parent A from generation 0 of one run of the
6&square lawnmower problem without ADFs.
Figure 8.4 Gyrating S-scoring trajectory of parent B from generation 0 of one run of the 64-
square lawnmower problem without ADFs.
Figure 8.5 shows the 1L-scoring traiectory of the best of generation L.
The i.i. mowed squares are unshaded and the 53 unmowed squares are
shaded in gray. The first three lines of this eight-line offspring Program
are identical to the first three lines of the five-line Program for parent A.
These three lines cause the lawnmower, starting at (4,4), to mow north
two squares, turn left, mow west three squares to the Square marked by
the arrow labeled "1,," arldturn left (south). The lawnmower is thus at the
square marked by the arrow labeled "L." The crossover operation inserts
the boldface code constituting almost all of lines 4 through 7 of the eightline offspring. This code comes from parent B. The lawnmower's action
between arrow 1 and the arrow 2 comes from the crossover fragment contributed by parent B. The final two MOWs (southward) correspond to line
five of parent A and line eight of the offspring.
233 The Lawnmower Problem
v2v) ()
I
r-.
-
q)
N
1..
G
V)
Figure 8.5 11-scoring trajectory of the best of generation 1 of one run of the 64-square problem
withoutADFs.
Worst of Generation
-a- Average
-+ Best of Generation
o
cenllation 34
Figure 8.5 Fihress curves for the 64square lawnmower problem without ADFs.
No program from generation 0 mows as many as lL squares of lawn.
Indeed, the best from generation 0 mows only eight ,q.rur"r. parents L
and2 mow only seven and five squares, respectively. They are above average, but not the best, of their generation. This particular offspring in
generation 1 mows LL squares because the creative effect of the crossover
operation directed the search of program space into a new and promising
region. In this instance, the new area of the search space contains an offspring that achieves the new higher level of fitness of 11. The LL-scoring
best of generation L follows an irregular trajectory in its attempt to solve
the problem.
This run is a typical run of genetic programming in that as one proceeds
from generation to generatiorL the fifiress of the best-of-generation program
Chapter 8
and the average fibress of the population as a whole generally improve. For
example,for generation5, thebest-of-generationprogramconsists of L19points
and mows 32 of the 64 squares of lawn, as shown below:
(v8A (PROGN (PROGN (PROGN (PROGN (MOW) (MOW) ) (V8A (MOW) (MOW) ) )
(PROGN (PROGN (MOW) (0,5)) (Vga (FROG (PROGN (PROGN (v8A (VBA
(LEFT) (MOW) ) (PROGN (3,1) (MOW) )) (V8A (PROGN (MOW) (LEFT) )
(vBA (VBA (FROG (0,4)) (FROG (4,2))) (PROGN (PROGN (3,2) (MOW))
(vBA (MOW) (1,4)))))) (PROGN (FROG (PROGN (PROGN (PROGN (PROGN
(Mow) (Mow) ) (FRoc (LEFr) )) (pRocN (Mow) (v8A (Mow) (Mow) ) ))
(PROGN (V8A (PROGN (0,3) (-t,2) ) (V8A (MOW) (MOW) )) (PROGN (VBA
(MOW) (MOW) ) (PROGN (LEFT) (MOW) ))))) (VBA (FROG (7,2) ) (VBA
(7,3) (5,5)))))) (v8A (VBA (v8A (3,7) (LEFr) ) (PROGN (6,7)
(LEFr) )) (pRocN (v8A (s,6) (LEFr) ) (PROGN (MOW) (MOW) )))))) (vBA
(pRocN (Mow) (v8A (Mow) (LsFr) )) (pRoGN (PROGN (1 ,5) (3,1)) (V8A
(MOW) (MOW) )))) (VBA (PROGN (FROG (MOW) ) (FROG (6,0)) ) (PROGN
(pRocN (Mohr) (Mow) ) (vBA (Mow) (Mow) )))) .
This considerably more successful program is much larger than its predecessors. Indeed, increasing size is necessary for improved performance in this
problem.
Figure 8.6 presents the fitness curves for this run show^g,by generation, the stand ardtzed fitness of the best-of-generation program, the standardized fitness of the worst-of-generation Program, and the average of
the stand ardizedfitness for the population as a whole. The figure starts at
generation 0 and ends at generation 34 when at00% effective lawnmower
(i.e., one with a standardizedfitness of 0) is evolved on this particular run.
As one progresses from generation to generation in a typical run of
genetic programming, the fitness of the population as a whole generally
improves. The hits histogram is a useful monitoring tool for visualizing
the progressive learning of the population as a whole during a particular
run. The horizontal axis of the hits histogram reprcsents the number of hits
(0 to 64) while the vertical axis represents the number of individuals in the
population (0 to 1,000) scoring that number of hits.
Figure 8.7 shows the hits histograms for generations 0,5,20, and 34 of
this run. Note the left-to-right undulating movement of both the high point
and the center of mass of these three histograms over the generations.
This "slinky" movement reflects the improvement of the population as a
whole.
Figure 8.8 shows the structural complexity of the best-of-generation program and the average of the values of structural complexity of the programs in the population as a whole for this run of the 64-square lawnmower
problem without automatically defined functions. The structural complexity of the best of generation 0 is 23 and the average of the structural complexity of all the programs in the population for generation 0 is 9.7 .
The following 296-pont individual achieving a raw fitness of 64emerged
on generation 34 of this run without automatically defined functions:
235 The Lawnmower Problem
M 8-15 1623 2+31 323g ml 48_55 56_63 &
Figure 8.7 Hits histograms for generations 0,5,20,and 34 of the 64-square lawnmower problemwithoutADFs.
Chapter 8
h
Xq)
arl
C)
L
FI
+r
I
L€(t)
Without Defined Functions
Best of Generation
+ Averase
GenJlation
Figure 8.8 Structural complexity curves of a run of the 64-square lawnmower problem withoutADFs.
(V8A (VBA (VBA (FROG (PROGN (PROGN (V8A (MOW) (MOW) ) (FROG
(3,2))) (PROGN (V8A (PROGN (v8A (PROGN (PROGN (MOW) (2,4) ) (FROG
(5,6))) (PROGN (VBA (MOW) (6,0)) (FROG (2,2)))) (VBA (MOW)
(MOW) )) (PROGN (V8A (PROGN (PROGN (0,3) (7,2) ) (FROG (5,6)))
(pRoGN (vBA (MOW) (6,0)) (FROG (2,2)))) (vBA (MOW) (MOW) )))
(PROGN (FROG (MOW) ) (PROGN (PROGN (PROGN (VBA (MOW) (MOW) ) (FROG
(LEFr) )) (PROGN (MOW) (v8A (MOW) (MOW) ) ) ) (PROGN (v8A (PROGN
(0,3) (7,2)) (Vga (MOW) (MOW) )) (PROGN (V8A (MOW) (MOW) ) (PROGN
(LEFT) (MOW) )))))))) (vBA (PROGN (v8a IPROGN (PROGN (MOW) (2'4))
(FROG (5,6))) (PROGN (v8A (MOW) (6,0)) (FROG (2,2)))\ (V8A (MOW)
(MOW) )) (v8A (FROG (LEFr) ) (FROG (MOW) ))) ) (v8A (FROG (vBA
(PROGN (VBA (PROGN (V8A (MOW) (MOW) ) (FROG (3,7))) (VBA (PROGN
(MOW) (LEFT) ) (Vga (Mow) (5,3)))) (PROGN (PROGN (VBA (PROGN (LEFr)
(MOW) ) (vae (1,4) (LEFT) )) (PROGN (FROG (MOW) ) (vaa (Mow)
(3,7))))(VBA(PROGN(FROG(MOW))(v8A(LEFr)(MOW)))(vBA(FROG
(L,2) ) (vea (MOW) (LEFr) ) ) ) ) ) (PROGN (V8A (FROG (3,1)) (vBA
(FROG (PROGN (PROGN (VBA (MOW) (MOW) ) (FROG (3,2))) (FROG (FROG
(5,0))))) (v8A (PROGN (FROG (MOW) ) (vsa (Mow) (Mow) )) (v8A (FROG
(LEFr) ) (FROG (MOW) ))))) (PROGN (PROGN (PROGN (PROGN (LEFr)
(MOW) ) (vea (Mow) (3,7))) (v8A (v8A (MOW) (MOW) ) (PROGN (LEFr)
(LEFT) ))) (V8A (FROG (PROGN (3,0) (LEFT) )) (V8A (PROGN (MOW)
(LEFr) ) (FROG (5,4) ))))))) (PROGN (FROG (VBA (PROGN (VBA (PROGN
(PROGN (VBA (PROGN (PROGN (MOW) (2,4) ) (FROG (5,6))) (PROGN (VBA
(MOW) (L,2) ) (FROG Q,2)))) (VBA (MOW) (MOW) )) (FROG (3,7)))
(vBA (pRoGN (PROGN (MOW) (2,4) ) (FROG (5,6))) (PROGN (VBA (MOW)
(6,0)) (FROG (2,2))))) (PROGN (PROGN (VBA (FROG (MOW) ) (VBA
(r,41 (LEFT) )) (PROGN (FROG (MOW) ) (Vaa (Mow) (3,7)))) (v8A
(PROGN (FROG (MOW) ) (VBA (LEFT) (MOW) )) (V8A (FROG (1,2)) (V8A
(MOW) (LEFT) ))))) (PROGN (vBA (PROGN (FROG (2,4)) (Vga (MOW)
(MOW))) (v8A (FROG (MOW)) (LEFr))) (PROGN (3,0) (LEFT))))) (FROG
(vBA (7,4) (MOW) ))))) (VBA (v8A (PROGN (MOW) (4,3)) (VBA (LEFT)
(6,1) ) ) (MOW) ) ) .
237 The Lawnmower Problem
Figure 8.9 First partial trajectory of 296-potnt program for operations 0 through 30 without
ADFs.
This 296-point program solves the problem by agglomerating enough erratic
movements so as to coverthe entire area of thelawnwithinthe allowed maximurn number of operations. h:r fact, the way that this program solves the
problem is so tedious and convoluted that it can be easily visualized only
after dividing the trajectory of the lawnmower into three epochs.
Figure 8.9 shows a partial trajectory of this best-of-run 296-point individual
for the first epoch consisting of mowing operations 0 through 30; figure 8.L0
shows a partial trajectory for the second epoch involving mowing operations
3L through 60; and figure 8.11 shows the thtud epoch involving mowing
operations 61 through 85. Since ail,64 squares are mowed in these figures,
they are all unshaded.
As can be seen, even though the problem environment contains considerable regularity in that it requires mowing all64 squares of the lawn in an
unobstructed toroidalyard,this solutioninvolves a tangled agglomeration of
irregular movements. For example, between mowing operations 2 and 3, the
lawnmower FRoGs up two rows and three columns to the righf between
operations 4 and 5, the mower FRoGs up six rows and three columns to the
lefq and between operations 6 andT,the mower FRoGs up two (i.e., down
six) and two columns to the right.
There is a close relationship between the size of a program and its fihress.
Since raw fitness is higher for better individuals for this problem, this relationship can be seen by comparing structural complexity to raw fihress (rather
than standardized fitness).
Figure 8.12 shows, by generatiory the raw fitness and the structural complexity of the best-of-generation program for this run without automatically
defined functions. The vertical axis on the left of the figure runs between 0
and the number of squares of lawn (64). The vertical axis on the right runs
238 Chapter 8
-.1-i-----i,-----
I
-J_
I
I
I
I
I
ts7 -{--
4
Figure 8.10 Second partial trajectory of 296'point program for operations 3L through 60 withoutADFs.
Figure 8.11 Third partial trajectory of 296-point program for operations 61 through 85 withoutADFs.
The Lawnmower Problem
€
X
O
a
148 5
U
tr
tFa
I
L
*a
^ 0
U
(h
0
€)
a
rt
-
F
tl.-
296
o
c.r,llutior,
34
Figure 8.12 Superimposition of the raw fitness and structural complexity of the best-of-generation programs for the 64-square lawnmower problem without ADFs.
between 0 and 296 (so that the graph of structural complexity and the graPh
of raw fibress intersect at generation 34 for the known 10O%-correct 296-point
program for this run). The superimposition of these two graphs in this way
shows the close relationship between structural complexity and fitness for
this particular problem when it is run without automatically defined functions.
The average structural complexit!, S*rttorr, of the 1O0%-correct programs
from the 35 successful runs (out of 38 runs) of the 64-square lawnmower problem without automatically defined functions is 280.8 points. It takes an average of about four and a half functions and terminals in these program trees to
mow one square of lawn. The successful programs are large because they
make no use of the inherent regularity of the problem environment.
Figure 8.L3 presents the performance curves based on the 38 runs of the
64-square lawnmower problem without automatically defined functions.
Only 3% of the runs are successfulby generation 17. The cumulative probability of success, P(M,l), is 92%by generation 49 and 92%by generation
50. The two numbers in the oval indicate that if this problem is run through
to generation49, processing a total of E*uuour = 100,000 individuals (i.e.,
1,000 x 50 generations x 2 runs) is sufficient to yield a solution to this
problem with 99% probability.
See also Koza I993c,I994a.
8.4 LAWN SIZE OF 32 WITHOUT ADFs
When the size of the problem is scaled down by 50% from 64to 32 squares
of lawn (an 8-by-4 configuration), the average structural complexity,
Swithout r of the 100%-correct programs from the 64 successful runs (out of
64 runs) without automatically defined functions is 1"45.0. This is a drop
from the value of 280.8 for the 64-square lawn; however, it still takes an
average of about four and a half functions and terminals in the program
trees to mow one square of lawn.
240 Chapter 8
^ 1
a
a
€)
I
cJ
I
-
a
c{r
h
.Fa
.-
-
.-
-
GA
-
L
A .
-
Without Defined Functions
1.000.000
(50,92Vo)
500.000
02550
Generation
Figure 8.13 Performance curves for the 64-square lawnmower problem showing that
Ewithort = l'00'000 without ADFs'
Without Defined Functions
200.000
-' U
q)
v)
0
q)
I
f.l
A ,
-
q)
,-
rP
a
-
ct
-
)
-
U
.-
.-
-' v
-
-
Ft
50
-
q)
a
0
€) ()
lr
A
FI
o
II
€
(t)
-
6grl
-
t
. A
.-
-' I
a4
I
^ 100
\\V
a
a
q)
I
I
-
)
a
tsso
*r
. l
-
.-
-
cli
l.r
A .
-
\
(50,1007o)
100.000
25
Generation
Figure 8.14 Performance curves for the 32-square lawnmower problem showing that
Ewithout = 19,000 withoutADFs'
l- pur I
l+ I(M. i, z)l
I M dpool
I z=99%a I
I R(z)=Z I
I N=38 |
(17,2.63Vo)
\
l- p,MD I
l+ I(M, i, z)l
m:l0001
I z=99% |
lR(z)=t I
I N=64 |
(7,SVo)
241 The Lawnmower Problem
As one would expect, the computational effort decreasesubstantially.
Figure 8.14 presents the performance curves based on the 64 runs of
the 32-square lawnmower problem without automatically defined functions.
The cumulative probability of succ ess, P(M, i ), is 100% by generation 18. The
two numbers in the oval indicate that if this problem is run through to generation 18, processing a total of E*roour = 19,000 individuals (i.e., 1,000 x L9
generations x L run) is sufficient to yield a solution to this problem wilt}l99%
probability.
8.5 LAWN SIZE OF 48 WITHOUT ADFs
When the lawn size is 48 (an 8-by-6 configuration), the average structural
complexit!, S*itnorr, of the l0O%-correct programs from the 39 successful
runs (out of 43 runs) without automatically defined functions is2l7.6 (i.e.,
about 4.5 times the lawn size).
Figure 8.L5 presents the performance curves based on the 43 runs of the
48-square lawnmower problem without automatically defined functions. The
cumulative probability of success, P(M, i), is 91o/"by generanon 27 and 98%
by generation 50. The two numbers in the oval indicate that if this problem is
run through to generation 2T,processing a total of E*uyou, = 56,000 individuals (i.e., 1,000 x 28 generations x 2 runs) is sufficient to yield a solution to this
problem with 99% probability.
8.6 LAWN SIZE OF 80 WITHOUT ADFs
When the lawn size is 80 (an 8-by-10 configuration), the average structural complexitfr Swithou,, of the 1O0%-correct programs from the 32 successful runs (out of 90 runs) without automatically defined functions is
366.1, (i.e., about 4.6 times the lawn size).
Figure 8.L5 presents the performance curves based on the 90 runs of the
80-square lawnmower problem without automatically defined functions. The
cumulative probability of success, P(M,i), is 35.6% by generation 50. The
two numbers in the oval indicate that if this problem is run through to generation 50, processing a total of E.uoour = 561,000 individuals (i.e., 1,000 x 5L
generations x LL runs) is sufficient to yield a solution to this problem with
99o/o probability.
8.7 LAWN SIZE OF 95 WITHOUT ADFs
\{hen the lawn size is 96 (an 8-by-12 configuration), the average structural
complexitf, S.itnour, of the l0O%-correct programs from the 14 successful runs
(out of 284 runs) without automatically defined functions is 408.8 (i.e., about
4.3 times the lawn size). The progression in values of average structural complexity, Swithout, of the solutions to the lawnmower problem with lawn sizes
of 32, 48, 64,80, and 96 is 145.0, 217.6,280.8, 336."1., and 408.8, respectively.
242 Chapter 8
t
1) (n
a
q)
I
tr
A
F{
q)
A
-.
+J
0
-
cg
-,
-
-
I
.-
.-
-
el
I
-
--. I
. 6
a
0
q)
9
9
E
-
a
CH
ia.-
-
o -
A
-
cg
A
-
L
A .
E
-
12,000,000€
'"(l)
z-\ v: (50 E=561.000) o
\-- - ---,"--, 6.)
r 9
L A + v
lli
LOr
t0)
l ^
l -
| 6,000,000 g
l 0
:EI
/\€
i_ 150,35.6vo5 '/
i.I
: F ,-=
ttoE
t- P,MD I
l+ I(M, i, z)l
I M = rpotl
I z=997o I
lR(z)=lt I
I N=90 |
0
(t)
q)
I
I
I
-
0
|i-io)u
{J
.rl
-
.-
-
d
-
li
A .
-
Without Defined Functions
600,000
\
(50,98Vo)
300,000
25 (12'2vo) Generation
Figure 8.15 Performance curves for the 48-square lawnmower problem showing that
Ewithout = 56,000 withoutADFs.
Without Defined Functions
(27,IVo)
Generation
Figure 8.15 Performance curves for the 80-square lawnmower problem showing that
Ewithout = 551,000 without ADFs'
243 The Lawnmower Problem
trt
q)
v)
t or6)
o
tr ||,
-
q)
-
+)
ct)
-
d
!l-
-
. i
. x
saU
FT-
^ 100
U2
a
o
I
I
0
Eso
+)
-
.-
-
L
tl,
-
0
Without Defined Functions
50,000,000
50 E--4,692,000
- 25,000,000
i
i
- (50, 4.97o)
25
Generation
(32,0.37o)
Figure 8.17 Performance curves for the 96-square lawnmower problem showing that
Ewithout = 4,692,000 withoutADFs.
Thus, as the size of the larnm increases in equal increments of t6 squares, the
size of the solutions becomes more and more unwieldy.
Figure 8.17 presents the performance curves based on the 284 runs of
the 96-square lawnmower problem without automatically defined functions. The cumulative probability of success, p(M,i), is 4.9% by generation 50. The two numbers in the oval indicate that if this problem is
run through to generation 50, processing a total of E*nnort = 4,692,000
individuals (i.e., 1,000 x51 generations x 92 runs) is sufficientto yield a solution to this problem with 99% probability.
The progression in values of computation al effort, Ewithout, for the
Iawnmower problem with lawn sizes oI BZ,4B, 64,g0, and 96 is 19,000, s6,000,
100,000, 561,000, and 4,692,000, respectively. Thus, as the size of the lawn
increases, dramatically more computational effort is required to yield a solution to the problem without automatically defined functions.
8.8 PREPARATORY STEPS WITH ADFs
Each of the solutions presented in the previous section for solving the
lawnmower problem without automatically defined functions contained at
least 64 tvtows or FRoGs when the lawn size is 64. HoweveL a human progranuner would never consider solving this problem in this tedious way.
Instead, a human prograrnmer would write a program that first mows a certain small subarea of the lawn in some orderly way; the lawnmower would
then be repositioned to a new subarea of the lawn in some orderly (probably
tessellatin g) way; and the mowing action would be repeated on the new subarea of the laum. The program would contain enough invocations of the orderly
244 Chapter 8
method for mowing subareas so as to completely mow the entire lawn. That
is, a human Programmer would exploit the considerable regularify and symmetry inherent in the problem environment by decomposing the problem
into subproblems and would then repeatedly use the solution to the subproblem in order to solve the overall problem.
In applying genetic programming with automatically defined functions
to the lawnmower problem, we decided that each individual overall
program in the population will consist of two function-defining branches
(defining a zero-argument function called ADF O and a one-argument function ADFl) and a final (rightmost) result-producing branch. The second
defined function ADF1 can hierarchically refer to the first defined function ADFO. We envisaged that the first automatically defined function, ADFO,
should be capable of limited, local motion and that the second automatically defined function, ADF1, should be capable of motion over
larger distances.
We first consider the two function-defining branches.
The terminal set, tadfl,for the zero-argument defined function ADFO
consists of
,Iadf7= { (LEFT), (MOW)
, frvg}.
The function set, fad.f7, for the zero-argument defined function ADFO is
fadfT- {v8a, PROGN}
with an argument map of
{2,2}.
The body of anp0 is a composition of primitive functions from the function set, fadf7,and terminals from the terminal set, tadf7.
The terminal set, tad.fl,for the one-argurnent defined function ADF1 taking
dummy variable ARGO consists of
,Iadf1 - {ancO, (LEFT), (Mow) ,frvg}.
The function set, fadf1, for the one-argulnent defined function ADFI- is
fadfl - {anro, vBA, FRoc, PRocN},
with an argument map of
{0,2, r,2}.
The body of anrl is a composition of primitive functions from the function set, fad11, and terminals from the terminal set, tadf1.
We now consid.er the result-producingbranch.
The terminal set, Trpb,for the result-producingbranch is
tpb= [ (lErr), (Mow) ,9tvg].
The function set, frpb, for the result-producing branch is
frpb= {anrO, ADFI-, vBA, FROG, PROGN},
245 The Lawnmower Problem
Thble 8.2 Thbleau withADFs for the 64-square lawnmower problem.
Objective: Find a program to control a lawnmower so that it
mows the grass on all 64 squares of lawn in an
trnobstructed yard.
Architecture of the
overall program
with ADFs:
One result-producing branch and two functiondefining branches, with anrO taking no arguments
and anrl taking one argwnent and with aort
hierarchically referring to ADFO.
Parameters: Branch Vping.
Terminal set for the
result-producing
branch:
(LEFT), (MOW), and the random constants S16.
Function set for the
result-producing
branch:
ADF0, ADF1, VBA, FROG, ffid PROGN.
Terminal set for the
function-defining
branch ADFo:
(LEFT)
,
(MOW), md the random constants S16.
Function set for the
function-defining
branch ADFO:
VBA and PROGN.
Terminal set for the
function-defining
branch ADFI:
ARG0, (LEFT), (ltOw), and the random constants 9tr6.
Function set for the
function-defining
branch ADFt-:
VBA, PROGN, and FROG, and anpO (hierarchical
reference of anr'0 by ADFI).
with an argument map of
{0, 1,2,1,21.
The result-producing branch is a composition of the functions from the
fu:.P":e_t, fipb, and terminals from the termin al set, Trpb.
Table 8.2 summarizes the key features of the lawnmower problem in an
unobstructed yard with 64 squares with automatically defineJ functions.
8.9 Lawn Size of 64 with ADFs
When genetic Programming with automatically defined functions is applied
to this problem, the results are very different from the haphazardsolutions
obtained without automatically detined ftrnctions.
We illustrate this by examining five particular runs (out of Z6) of this
problem.
[r the firstillustrative run of thisproblemwith automatically defined functions, the following 100%-correct 78-point program scoring 64 (out of il)
emerged in generation 2 (a very early generation):
246 Chapter 8
Figure 8.18 Trajectory of row-mowing lawnmower from run 1 with ADFs.
(progn (defun ADF0 o
(values (VBA (PROGN (VBA (VBA (LEFT) (5,5)) (pRocN
(MOW) (LEFr) )) (v8A (PROGN (MOW) (Mow) ) (vea (Mow)
(MOW) ))) (V8A (PROGN (VBA (1,4) (MOW) ) (PROGN (3,1)
(MOW) )) (PROGN (PROGN (3,1) (MOW) ) (PROGN (LEFr)
(LEFT) ))))))
(defun ADF1 (ARGO )
(values (V8A (PROGN (FRoc (pRocN ARGO iaopOllt (VBA
(pRocN (Mow) (ADFO)) (V8A (v8A (ADFO) (3,4)) (v8A
(ADFO) ARGO)))) (V8A (FROG (FROG (MOW) )) (PROGN (PROGN
(MOW) (3,5)) (PROGN (MOW) (Mow) ))))))
(values (VBA (ADF1 (ADFI (VBA (7,1) (LEFT) ))) (VBA (VBA
(pRoGN (LEFT) (LEFr) ) (v8a (7,0) (LEFr) )) (FROG (vBA
(ADFO) (Mow) )))))) .
The resultproducing branch of this 78-point program contains two invocations of ADF1, one invocation of eof 0, four LEpTs, and one MOW. ADF1
contains four invocations of ADrO, no furns, and five MOWs. ADFO contains
eightMows and four LEFTs.
Figure 8.18 shows the tuajectory of the row-mowing lawnmower for this 78-
point program from run L with automatically defined functions. The lawnmower
here takes advantage of the inherent regularity of the problem environment. It
mows an entire row consisting of eight consecutive squares in an easterly direction and then proceeds to the next row to the south and does the same. The fact
that the entire trajectory can be conveniently presented in only one figure testifies to this solution's predominantly regular behavior.
This solution is a hierarchical decomposition of the problem. First, genetic
Programming discovered a decomposition of the overall problem into eight
247 The Lawnmower Problem
subproblems each consisting of mowing a single row of eight consecutive
squares. Thery genetic programming discovered a sequence of tums and
moves to implement the mowing of an entire row of eight squares. Third,
genetic programming assembled the results of the row mowing subproblem
by repositioning the lawnmower to the adjacent row.
In run Z,thebest of generation 0 is the following S4-point program scoring
56 (out of 64):
(progn (defun ADFO o
(values (PROGN (PROGN (VBA (VBA (MOW) (LEFT) ) (PROGN
(6,4) (MOW) )) (VBA (VBA (4,3) (3,3)) (Vga (LEFT)
(LEFr) )) ) (PROGN (VBA (PROGN (3,t1 (5,3)) (VBA (MOW)
(MOW) )) (PROGN (VBA (MOW) (LEFT) ) (PROGN (MOW)
(Mow)))))))
(defun ADFl (ARGO )
(values (FROG (PROGN (VBA (FROG (ADFO)) (PROGN (ADF0)
(ADFO) ) ) (VBA (VBA (LEFr) (1, s) ) (PROGN (ADFO)
(LEFT) ))) )) )
(values (ADF1 (ADF1 (PROGN (ADF1 (LEFT) ) (PROGN (LEFT)
(ADF0))))))).
The raw fitness of 56 achieved by the best of generation 0 with automatically defined functions is considercbly better than the raw fihress (i.e.,
eight) of the previously cited best of generation 0 without automatically
defined functions.
Figure 8.19 shows the improvement in fibress from generation to generation for this run with automatically defined functions.
Figure 8.20 shows the hits histograms for generations 0,1.,2,3, 4, andS of
the same run of this problem with automatically defined functions. The first
With Defined Functions
Maximum
Average
-* Minimum
Generation
Figure 8.19 Fitness curves of run 2 for the 64-square lawnmower problem with ADFs.
Chapter 8
64
a
V) c)
t
I
*l.-
-
Fa
c)4n
.-
-
t{
-
-
v
eaE
I
+)
a
248
h
>r
P
8-15 1G23 2+3t 32-39 q-47 48-55
Hits
Generation 3
M 8-15 t6-23 ',2+31. 32-39 40-47 48-55
Hib
Generation 4
8-15 t6-23 U-31 32-39 40-4't 48-55
Hits
Generation 5
I
'1,
a7 8-15 16-23 24-31 32-39 40 47 48-55 56-63 64
Figure 8.20 Hits histograms for run 2 of the 64-square lawnmower problem for generations 0
through5 withADFs.
The Lawnmower Problem
U7
>. 9
6)
I
>l
$ +oo
,|
a7
>l
O
q)
liEr
249
250
eight buckets each represent a range of eight values of hits; the ninth bucket
contains only programs whose raw fihress (i.e., hits) is precise$ 64 (t.e., a
L00%-correct solution). Note the arrow on the histogram for generation 5
pointing to the simultaneous emergence of four 1O0%-correct individuals in
the population on that generation.
Figure 8.21 shows the structural complexity curves for run 2 of the
54-square lawnmower problem with automatically defined functions. The
figure shows, by generation, the structural complexity in the best-of-generation program and the average of the structural complexity of the programs in the population as a whole. The structural complexity of the best
of generation 0 is 63 and the average of the structural complexity of the
programs in the population as a whole for generation 0 is 287 wrtll. automatically defined functions.
The following 1007-correct 42-point progam scoring 64 (out of 64) emerged
in generation 5:
i/nrnm /Aafrrn
\yr vvrl \ vu! qrr ADFQ o
(val-ues (PROGN (V8A (0,1-) (2,0) ) (V8A (V8A (PROGN (MOW)
(LEFT) ) (VBA (MOW) (LEFT) )) (PROGN (V8A (LEFT) (LEFT) )
(PRocN (Mow) (Mow) ))))))
(defun ADF1 (ARGO )
(valucs (VBA (FROG (FROG (ADFO))) (PROGN (PROGN (VBA
(MOW) (ADFO)) (vBA (ADF0) (MOW) )) (v8A (FROG (ADFO))
(VBA ARGO ARGO))))))
(values (ADF1 (ADF]" (ADF1 (appt (ADF0))))))) .
This 42-point solution is a hierarchical decomposition of the problem.
Genetic progamming discovered the decomposition of the overall problem,
discovered the content of each subroutine, and assembled the results of the
multiple calls to the subroutines into a solution of the overall problem. The
result-producingbranch does notcontainany LEFT, MOW, or FROG operations
at all. ADF1 contains four invocations of AonO, two MOWs, and no LEFT or
FROG operations. ADFO contains four MOWs and four LEFTs.
Figure 8.22 shows the column-mowing trajectory of the lawnmower for
this 42-point solution. Note the differencebetween this regular trajectory and
the haphazardcharacter of the three partial trajectories shown in figures 8.10,
8.L1,, and 8.12. The lawnmower here takes advantage of the regularity of the
problem environment. Itperforrrs a tessellating activity that covers the entire
lawn. Specifically, it mows four consecutive squares in a column in a northerly directiory shifts one column to the west, and then does the same thing in
the next column. This solution involves only eight multiple visits to the silne
square.
\Atrhen this 42-point program is evaluated, ADF0 is executed first by the
result-producingbranch. ADFO begins with a PRoGN whose first argument
is (VBA (0, 1) (2, 0 ) ) . Since vector addition VBA has no side effects
and since the return value of pnocN is the value returned by its last (second) argument, this first argument to the PRocN canbe ignored. Since the
Chapter 8
Pl .r-l
X
c)
()
cl
ti
ia
I
L
'aj
a
With Defined Functions
Best of Generation
<F Average
herreratiori
Figure 8.21 Stmctural complexity curves of run 2 of the 64-square lawnmower problem
with ADFs.
Figure 8.22 Tiajectory of column-mowing lawnmower from run 2 withADFs.
The Lawnmower Problem
Figure 8.23 Trajectory of swirler from run 3 of lawnmower problem with ADFs.
remainder of ADFO contains only MoW and LEFT operations, ADFO retums
(0,0).As it furns out, ADF1 never uses its dummy variable.
The basic activity of anr'O is to mow four squares of lawn in a northwesterly zigzag pattern. This zigzagaction is illustrated at the starting point (4,4)
in the middle of the figure. ADFO moves forward (i.e., north) one square and
mows that square; it then fums left (i.e., west) and moves forward and mows
that square; it then turns left three times (so that it is again oriented north);
and it then moves and mows fwo squares.
The northwesterly zigzag mowing activity of anr'O is then repeatedly
invoked. The result-producing branch invokes ADFI- a total of four times.
Each time ADF1 is invoked, ADF0 is invoked four times. This hierarchy of
invocations produces a total of 16 calls for the zigzag activity of anp0. Because of the initial direct call of ADFO at the beginning of the evaluation of
the result-producing branch, the last of the 16 hierarchical invocations of
ADF 0 is not needed since the program is terminated by virtue of the completion of the overall task.
This zigzagging solution is a hierarchical decomposition and solution
of the problem involving three simultaneous, automatic discoveries.
Genetic programming discovered a decomposition of the overall problem
into 15 subproblems each consisting of the northwesterly zigzagmowing
pattern. Genetic programming also discovered the sequence of turns and
moves to implement the northwesterly zigzag mowing action. In addition, genetic programming assembled the results of the mowing motion
into a solution of the overall problem by appropriately repositioning the
lawnmower.
In run 3, the following L07-point "swirle{' emerged in generation 5 as a
I1}o/o-correct solution to the problem:
Chapter 8
Figure 8.24 tajectory of crisscrosser of run 4 of lawnmower problem with ADFs.
(progn (defun ADFO o
(values (VBA (VBA (VBA (PROGN (7,0) (2,5) ) (Vea (6,4)
(MOW) ) ) (PROGN (VBA (LEFr) (MOW) ) (VBA (s,1)
(LEFr) ))) (v8A (vBA (vBA (MOW) (0,1) ) (PROGN (PROGN
(PROGN (LEFT) (LEFT) ) (VBA (LEFT) (MOW) ) ) (PROGN (VBA
(0,1) (MOW) ) (Vea (LEFr) (0,5))))) (PROGN (PROGN
(Mow) (Mow) ) (PROGN (LEFr) (Mow) ))))))
(defun ADF1 (ARGO )
(values (PROGN (FROG (VBA (VBA (VBA (VBA (PROGN (3,2)
(ADFO) ) (VBA (ADFO) (MOW) ) ) (FROG (5,1)) ) (VBA (PROGN
(3,2) (ADFO) ) (vBA (ADFO) (MOW) ) ) ) (vBA ARGO ARGO) ) )
(FROG (PROGN (FROG ('7 ,4) ) (vga (MOW) (MOW) ) ) ) ) ) )
(values (ADF1 (VBA (PROGN (PROGN (MOW) (3,6)) (ADF1 (VBA
(LEFT) (LEFT) ))) (PROGN (PROGN (PROGN (VBA (ADF1 (MOW) )
(pRocN (ADFO) (2,0) ) ) (PROGN (ADFI (MOW) ) (ADF1
(ADFO)))) (PROGN (ADFI (1,5)) (Vea (2,6) (LEFr) )))
(ADFI (ADF6))))))) '
Figure 8.23 shows that the trajectory of this L07-point program consists of a
counterclockwise swirling motion which very efficiently covers 100% of the
lawn.
In run 4, the following 9S-point crisscrosser emerged in generation 4 as a
1O0%-correct solution:
/nrnan lAaF. rn ^f)FQ ( \y!vYr] \vu!ura a )
(values (VBA (PROGN (VBA (VBA (VBA (6,6) (MOW) ) (PROGN
(L,6) (vBA (MOW) (lrrrl I ; I (VBA (MOW) (0,1) ) ) (VBA
(vBA (MOW) (MOW) ) (PROGN (MOW) (MOW) ))) (PROGN (PROGN
(vBA (6,7) (MOW) ) (VBA (LEFr) (MOW) )) (PROGN (VBA
(MOW) (LEFr) ) (PROGN (0,6) (MOW) ))))))
253 The Lawnmower Problem
254
(defun ADF1 (ARGO )
(values (VBA (PRocN (VBA (PROGN (MOW) (MOW) ) (vge (5,1)
(LEFT) )) (PROGN (FROG (LEFT) ) (VgE ARGO ARGO) ) ) (VBA
(pRocN (pRocN (Mow) (6,2) ) (vBA (ADF0) (MOW) )) (PROGN
(v8A ARGO (ADFO) ) (FROG (ADFO) )) ) )))
(values (FROG (PROGN (ADFI (PROGN (LEFT) (5,1) )) (PROGN
(FROG (ADF0)) (PROGN (V8A (PROGN (ADF1 (3,7) ) (FROG
(MOW) )) (ADFI_ (ADF1 (MOW) ))) (PROGN (FROG (FROG
(LEFr) )) (v8A (FROG (2,0)) (PROGN (5,0) (ADFO)))))))))).
Figure 8.24 shows that the trajectory of this best-of-run individual crisscrosses the lawn with both vertical and horizontal motions in such as way as
to mow the entire lawn.
In run 5, the following 56-point jumping column mower emerged in
generation 5 as a100"/o-correct solution:
(progn (defun ADFO o
(values (PROGN (PROGN (2,4\ (MOW) ) (VBA (PROGN (VBA
(pRocN (Mow) (5,0)) (pRocN (Mow) (Mow) )) (v8A (Mow)
(MOW) )) (vBA (PROGN (MOW) (MOW) ) (v8A (s,6)
(6,6)))))))
(defun ADFl- (ARGO )
(values (PROGN (PROGN (PROGN (FROG (ADF0)) (PROGN (4,5)
(MOW) )) (v8A (v8A (r,6) ARGO) (FROG (ADF0)))) (vBA
(FROG (ADFO)) (PROGN (FROG (MOW) ) (FROG (ADFO) ))))) )
(values (ADFI- (V8A (V8A (ADF1 (ADFO)) (PROGN (6,2)
(3,7))) (ADFI- (FROG (MOW) )))))).
The column mowing behavior of this 56-point program can be seen when
it is simplified to the following equivalent 29-point progxam:
(progn (defun ADFO o
(values (PRocN (MOw) (MoW) (MoW) (MoW) (MoW) (uow)
(MOw) (MOW) (3,4))))
(defun ADFl_ (ARGO)
(values (PRocN (FROG (ADFO)) (MOW) (FROG (ADF0)) (VBA
(FROG (ADF0)) (PROGN (MOW) (FROG (ADFO)))))))
(values (ADFI (prosn (ADFI (ADFO)) (ADF1 (MOW) ))))).
Figure 8.25 shows that the trajectory of this jumping column mower mows
an entire vertical column of the lawn and then jumps to another column and
repeats this behavior.
The761.00%-correct solutions obtained in 76 runs of the lawnmower problem with automatically defined functions can be classified, as shown in table
8.3, into five motifs based on the general nature of their trajectories. Ahuman
Programmer would probably write a program using a motif involving row
or column mowing; however, as can be seen, only about half of the76 runs
employed this motif. This table is reminiscent of table 6.7 which demonstrated
that genetic prograrnming employed parity functions only 42% ofthe time in
solving the Boolean S-parity problem.
Chapter 8
Figure 8.25 Trajectory of jumping column mower of run 5 of lawnmower problem withADFs.
Thble 8.3 Motifs of the trajectories of 76 solutions to the lawnmower problem with
ADFs.
Motif Percentage of 76 mns
Row or colunn mowing
Zigzaggrng
Large swirls
Crisscrossing
Tight swirls
49%
20%
t7%
r0%
4%
A videotape visualization of these trajectories can be found tn Genetic
Programming Il Videotape: The Next Generation (Koza and Rice L994).
The average structural comple xtty, S *i,n, of the 1O0%<orrect programs from
the 76 successful runs of the 64-square lawnmower problem with automatically defined functions is76.9 points.
Figure 8.26 presents the performance curves based on the 76 runs (out
of 76 runs) of the 64-square lawnmower problem with automatically
defined functions. The cumulative probability of succes, P(M ,i ), is L00%
by generation 10. The two numbers in the oval indicate that if this problem is run through to generation 10, processing a total of E*u, = 11,000
individuals (i.e., 1,000 x L1 generations x 1 run) is sufficient to yield a
solution to this problem with 99"h probability.
Thble 8.4 compares the average strucfural complexit!, Swithour ;u:ld S'ith,
and the computational effort, E*ithout and E*ur, for the lawnmower problem
with automatically defined functions and without them.
ZJJ The Lawnmower Problem
With Defined Functions
0) 0
U)
\0) /<)
lr
A . E
()
-
+r
a
-
-
-
. T
-
F
-
E
N
a
a
c) ()
I
!a
-
a
CH
+) .-
.-
A
-
L
||, -
100,000
(I,l%o)
Generation
Figure 8.26 Performance curves for the 64-square lawnmower problem showing that
Ewith = 11,000 withADFs.
Thble 8.4 Comparison table for the 64-square lawnmower problem.
Without ADFs WithADFs
Average structural 280.8
complexity S
Computational effortE 100,000
76.9
11,000
300
2m
s
100
0
WithoutADFs WithADFs Without ADFs WithADFs
Figure 8.27 Summary graphs for the 64-square lawnmower problem.
256 Chapter 8
With Defined Functions
FFI()u)
U)
o
I
ti
A
f{
q)
'{-.
u)
-
-
.-
-
{.1
E
a
a
()
I
I
!.
-
a
tsso
h
J
t r
A
H
K*o
(50, 1007o)
30,000
(0,l}Vo) 25
Generation
Figure 8.28 Performance curves for the 32-square lawnmower probiem showing that
Ewith =5,000withADFs'
Thble 8.5 Comparison table for the 32-square lawnmower problem.
Without ADFs WithADFs
Average structural
complexity S
Computational effort E
1,45.0
19,000
66.3
5,000
Figure 8.27 summarizes the information in this comparison table and
shows a structural complexity ratio, R5, of 3.65 and an efficiency ratio,
Ru, of 9.09.
8.L0 LAWN SLZE OF 32 WITH ADFs
When the size of the problem is scaled down from 64to 32 squares, the average strucfural complexitf, S*itn, of the lO0%-correct programs from the 52
successful runs (out of 52 runs) of the 32-square lawnmower problem with
automatically defined functions is 66.3 points. This value is only slightly
smaller than the average structural complefty of 76.9 when the lawn had
64 squares.
Figure 8.28 presents the performance curves based on the 52 runs of the
32-square lawnmower problem with automatically defined functions. The
cumulative probability of success, P(M,i), is t00% by generation 4. The
two numbers in the oval indicate that if this problem is run through to
generation 4, processing a total of E*uo = 5,000 individuals (i.e., 1,000 x 5
generations x 1 run) is sufficient to yield a solution to this problem with
99"/" probability.
257 The Lawnmower Problem
2ffi
s
100
20,000
E
10,000
0
Without ADFs With ADFs WithoutADFs WithADFs
Figure 8.29 Summary graphs for the 32-square lawnmower problem.
With Defined Functions
^ l
ct) (n
O
CJ
I
-
a
CH
>)
I
. l
-
.-
A
-
cg
A
-
l.r
A
F(
'60,000
\
\
(50, 1007o)
E
c) Cn (a
q)
I
L
A -
-l
c)
-
tD
0
-
A
-
)
-
.-EA
I
FI
30,000
(2,25Vo) Generation
Figure 8.30 Performance curves for the 48-square lawnmower problem showing that
Ewith = 9,000 withADFs'
Table 8.6 Comparison table for the 48-square lawnmower problem.
Without ADFs WithADFs
Average strucfural
complexity S
Computational effort E
217.6
56,000
69.0
9,000
Without ADFs With ADFs Without ADFs
Figure 8.31 Summary graphs for the 48-square lawnmower problem.
l- PMr) I
l+ (frn, i, z) |
IM=roool ffi_l
\= 3'ts
Chapter 8
With ADFS
-
v
q)
a
(A
q)
9
L
A
Ft
q)
-
.F)
0
-
GI
!l
rI
.-
.-
-
-
E
s
U) Ch ()
I
I
t
-
a
CH
>)
.h)
.-
-
o -
-.
60
l.r
A
H
With Defined Functions
300.000
(5O, tffiVo)
150,000
(2,7Vo)
Generation
Figure 8.32 Performance curves for the 80-square lawnmower problem showing that
E with = L7,A00 with ADFs'
Table 8.5 compares the average structural complexity, S,itnoar drid S*rtt ,
and the computational effort, E.ithout arrd Ewith, for the 32-square lawnmower problem with automatically defined functions and without them.
Figure 8.29 summarizes the information in this comparison table and
shows a structural complexity ratio, R5, of 2.L9 and an efficiency ratio,
Rp, of 3.80.
8.L1. IAWN SIZE OF 48 WITH ADFs
When the size of the lawn is 48 squares, the average structural complexity,
S.ith,of the 1OO%-correct programs from the 40 successful runs (out of 40)
of the lawnmower problem with automatically defined functions is
69.0 points.
Figure 8.30 presents the performance curves based on the 40 runs of the
48-square lawnmower problem with automatically defined functions. The
cumulative probability of success, P(M,i), is 100% by generation 8. The
two numbers in the oval indicate that if this problem is run through to
generation 8, processing a total of Err, = 9,000 individuals (i.e., 1,000 x 9
generations x 1 run) is sufficient to yield a solution to this problem with
99'/" probability.
Thble 8.6 compares the average structural complexTt!, Swithoar Errd \with,
and the computational effort, Erithout au,:td E*ur, for the 48-square lawnmower
problem with automatically defined functions and without them.
Figure 8.3L summarizes the information in this comparison table and
shows a structural complexity ratio, R5, of 3.15 and an efficiency ratio,
R", of 6.22.
t \
; \
I (tO, IOOVo)
I
$
i
{
I
l- Prr"rtl
l+ t(u' i, z)l
I Mdpool
I z=99%o I
lR(z)=t I
I N=90 |
259 The Lawnmower Problem
Without ADFs WithADFs
Table 8.7 Comparison table for the 80-square laurnmower problem.
Average strrrcfural
complexity S
Computational effort E
366.1,
561,000
78.8
17,000
S
200
260
Without ADFs With ADFs Without ADFs WithADFS
Figure 8.33 Summary graphs for the 80-square lawnmower problem.
8.!2 LAWN SIZE OF 80 WITH ADFs
\Mhen the larnrn size is 80, the average strucfural complex hty, S,i,n ,of the 700%-
correct programs from the 90 successful runs (out of 90) of the lawnmower
problem with automatically defined functions is 78.8 points.
Figure 8.32 presents the performance curves based on the 90 runs of the
8O-square lawnmower problem with automatically defined functions. The
cumulative probability of success, P(M,i ), is 100% by generation L6. The
two numbers in the oval indicate that if this problem is run through to
generation16, processing a total of E,ith = 17,000 individuals (i.e., 1,000 x
17 generations x 1 run) is sufficient to yield a solution to this problem with
99% probability.
Thble 8.7 compares the average structural complexity, S*i,no,r dfld S*ith,
and the computational effort, E.ithout alrtd E*ur, for the 8O-square lawnmower
problem with automatically defined functions and without them.
Figure 8.33 summarizes the information in this comparison table and
shows a structural complexity ratio, Rr, of 4.65 and an efficiency ratio, RE,
of 33.00.
8.13 LAWN SIZE OF 96 WITH ADFs
When the lar,vn size is 96, theaverage structural complex lty S *nn,of the L00%-
correct programs from the 137 successful runs (out of I37) of the lawnmower
problemwith automatically defined functions is 84.9 points. The reader may
recall that only 1,4 of 284 runs were successful for the 96-square lawnmower
problem without automatically defined functions.
Figure 8.34 presents the performance curves based on the 137 runs of the
96-square lawnmower problem with automatically defined functions. The
Chapter 8
0
a
q)
q)
q)
FI-
(t)
Eso
h
+j
.-
A
L
A
H
With Defined Functions
250,000
25
Generation
Figure 8.34 Performance curves for the 96-square lawnmower problem showing that
Ewith =20,000 with ADFs'
Thble 8.8 Comparison table for the 96-square lawnmower problem.
Without ADFs WithADFs
'500.000
\
\
(50,99.3Vo
-
o
a(n
.q) )q.)
L
ll.
-l
O
-
+a
a
rt
-
.-
Fl
t
I
-
(3,4.3Vo)
Average structural 408.8
complexity S
Computational effort E 4,692,000
84.9
20,000
WithoutADFs With ADFs Without ADFs With ADFs
Figure 8.35 Summary graphs for the 96-square lawnmower problem.
The Lawnmower Problem
Lawnmower-lawn size32
Lawnmower - lawn size 48
Lawnmower - lawn size 64
Lawnmower - lawn size 80
Lawnmower-lawn size96
2.L9
3.15
3.65
4.65
5.06
3.80
6.22
9.09
33.00
234.60
Thble 8.10 Comparison of the average structural complexity of solutions to the
lawnmower problem, with and without ADFs.
64 96
Thble 8.9 Summary table of the structural complexity ratio, R5, and the efficiency
ratio, RB, for the lawnmower problem with lawn sizes of 32,48,64,80, and 96
squares.
Problem Structural
complexity ratio R,
Efficiency ratio Ru
Swithout
Jwith
366.1, 408.8
78.8 84.9
cumulative probability of success, P(M,i ), is L00% by generation L9. The two
numbers in the oval indicate that if this problem is run through to generation
19, processing a total of Ewith = 20,000 individuals (i.e., 1,000 x 20 generations
x 1 run) is sufficient to yield a solution to this problem with 99"/"probability.
Table 8.8 compares the average structural complexitf, Switnonr dld Swith,
and the computational effort , Ewithout and Ewith, for the 96-square larnmmower
problem with automatically defined functions and without them.
Figure 8.35 summarizes the information in this comparison table and
shows a structural complexity ratio, R5, of 4.8L and an efficiency ratio,
Ru, of 234.6.
8.L4 SUMMARY FOR LAWN SIZES OF 32,48,64,80, AND 96
This chapter considered a problem with substantial symmetry *d regularity
in its problem environment. Five differently slzed versions of the problem
were solved, both with and without automatically defined functions.
For a fixed lawn size of 64, substantially fewer fitness evaluations are
required to yield a solution to the problem with 99% probability with automatically defined functions than without them. Moreover, the average size of
the programs that successfully solved the problem is considerably smaller
with automatically defined functions than without them.
Table 8.9 compiles the observations from the experiments in this chapter into one table. As can be seen, for the lawnmower problem with lawn
sizes of 32, 48, 64,80, and 96 squares, the efficiency ratio is greater than L
(indicating that fewer fitness evaluations are required to yield a solution
Chapter 8
145.0 2L7.6 280.8
66.3 69.0 76.9
a- Without Defined Functions
With Defined Functions
03248648096
Problem Size
Figure 8.36 Comparison of average structural compledty of solutions to the lawnmower problem for lavm sizes of 32, 48, 64,80, and96, with and without ADFs.
to the problem with 99% probability with automatically defined functions
than without them).
hr other words, for the lawnmower problem with lawn sizes of 32, 48, &,
80, and 96, genetic programming with automatically defined functions yields
a solution after fewer fitness evaluations than the solutions that are produced
without automatically defined functions. What is more, genetic programming
with automatically defined functions yietds a solution that is smaller in overall size than the solutions that are produced without automatically defined
functions. Moreovet automatically defined functions produce their greatest
benefit in terms of reducing the number of fitness evaluations for the largest
version of the problem.
8.15 SCALING FOR LAWN SIZES OF 32,48,64,80, AND 96
The question arises as to how the average structural complexity, S, and the
computational effort, E, changes as a function of problem size for the
lawnmower problem.
We first consider the average strucfural complexity, S, of the genetically
evolved solutions.
Table 8.10 consolidates the previously reported values of average structural complexity for lawns of sizes 32,48, &,80, and96, with and without
automatically defined functions for the larnmmower problem.
Figure 8.36 shows the relationship between the average structural complexity, Swithout artd S*itt,, of solutions for lawn sizes of 32, 48, 64,80, and96,
withandwithout automatically defined functions. As canbe seery the graphs
are approximately straight lines, with and without automatically defined functions. However, these two lines are different.
As previously observed, the average strucfural complexity, S*rtnout, of a
solution to the lawnmower problem without automatically defined functions
263 The Lawnmower Problem
264
ranges between 145.0 and 408.8 for lawns of sizes 92,48,64,80, and 96; it is
about four and a half times the size of the lawn. Howeve4, with automatically
defined functions, the structural complexitf, S,,itn, of the successful solutions
lies in the narrow range between 66.3 and84.9. When the size of the problem
is scaled up from 64 to 80 to 96 squares of lawn, the aver age size of a successful solution increases from 76.9 to only 78.8 and to 84.9.Conversely, when the
size of the problem is scaled dornm from 64 to 48 to 32 squares of lawn, the
average size of a successful solution decreases from 76.9 to 69.0 and to 66.3,
respectively.
\uVhen we perform a linear least-squares regression on the five points relating to the mns without automatically defined functions, we find that the strucfural complexi$, S*rtnorr, cdrrbe expressed in terms of the lawn size, L, as
Swithout =13.2+ 4.2L,
with a correlation of 1.00. The slope of 4.2 indicates that it takes approximately an additionaI4.2 points in the program tree to mow each additional
square of lawn. The vertical intercept of L3.2 (shown by the point where the
dotted line intercepts the vertical axis in figure 8.36) suggests the program
size associated with a hypothetical lawn size of zero.
In contrast, when we perform a linear regression on the five points relating
to the runs with automatically defined functions, we find that the structural
complexitf, S*itn,can be stated in terms of lawn size, L, as
Swith = 56.39 +0.29L,
with a correlation of 0.98. The slope indicates that it takes only about an additional0.29 points in the program tree to mow each additional square of lawn.
This slope with automatically defined functions is only about a fourteenth of
the slope (4.2) without automatically defined functions. On the other hand,
the vertical intercept of 56.39 (associated with a solution for a hypothetical
laum size of zero) is much larger with automatically defined functions than
without them. We interyret this to mean that there is a substantial fixed overhead associated with automatically defined functions, but relatively little
additional cost associated with growth in the size of this problem. Conversely,
there is much less fixed overhead involved without automatically defined
functions ,but a substantial additional cost associated with growth in the size
of the problem.
The scaling of the average structural complexity of solutions to this problem (and for the parity problem in section 6.15, and the bumblebee problem
in section 9.13) provides evidence in support of main point 5:
Main point 5: For the three problems herein for which a progression of
several scaled-up versions is studied, the average size of the solutions produced by genetic progranuning increases as a function of problem size at a
lower rate with automatically defined functions than without them.
This result is especially striking because our implementation of genetic programming is (for most problems herein) strongly predisposed to create larger
programs when automatically defined functions are being used.
Chapter 8
Table8.Ll- Comparisonof computationaleffortforlawnsof sizes 32,48,64,80,and
96.for the lawnmower problem, with and withoutADFs'
32 48 64 80 96
Ewithout
E*rth
During the creation of the initial random population and when new programs are created by crossover, we impose limitations on the size of the
programs thus created. The limitations differ depending on whether programs
in the population are represented using our usual LISP S-expressions or using
the array method. (The only time that the array method, described in appendix D, is used herein is with the 3-, 4-,5-, and 6-parity problems in chapter 6
and with the comparative study of the L5 architectures of the even-S-parity
problem in chapter 7).
When programs are represented using the usual LISP S-expressions, these
limitations are imposedbythe choices of two minor controlparameters called
D,nu,ot afrd Dr,ro,4 (appendix D). The default value for the maximum size
(measured by depth), D,,u,ot, is 6 for the random individuals generated for
the initial population. The default value for the maximum size (measured by
d"pth), Dc,"at"d,ist7 for programs created by the crossover operation. These
default values of D,n,,,o, antd Drr"otra aPPIY to the lawnmower problem.
The importantpoint is that the limitations imposedby Di,i,iot arrd D,,"o,r,
are applied separately to eachbranch of an overall program. Thus, the average sizeof programs in generation 0 with automatically defined functions
are much larger (by a multiple approximately equal to the total number of
branches in the overall program) than the average size without automatically defined functions. For example, since there are two automatically
defined functions in the lawnmower problem, the multiple is about 3. This
multiple is only approximate because the function sets of the various
branches are typically different (e.g., because of the inclusion of the
automatically defined functions in the function set of the result-producing branch and possibly in the function sets of one or more functiondefining branches).
Remarkably, the observed improvement in parsimony with automatically defined functions for this problem occurs after the population overcomes the substantial (3-to-1) predisposition in favor of larger programs.
This predisposition is apparent in figure B.B which shows that the structural complexity without automatically defined functions of the best of
generation 0 is 23 and the average of the values of structural complexity
for the population as a whole for generation 0is9.7.In contrast, figure
8.2L shows that the structural complexity with automatically defined functions of the best of generation 0 is 63 (i.e., about three times larger) and the
average structural complexity of the entire population for generation 0 is
28.7 (r.e., also about three times larger).
19,000 56,000 100,000
5,000 9,000 11,000
561,000 4,692,000
17,000 20,000
265 The Lawnmower Problem
+ Without Defined Functions
With Defined Functions
Problem Size
Figure 8.37 Comparison of computational effort for lawn sizes of 32,48,64,80, and 96, with
and withoutADFs.
Problems run with the array method (e.g., the even-3-,4-,5-, and 6-parity
problems in chapter 6 and the comparative study of the L5 architectures of
the even-S-parity problem of chapter 7) arenot biased in this way. There is a
size neutrality when the array method is being used.
We now consider the computational effort required for the lawnmower
problem, with and without automatically defined functions.
Thble 8.11 consolidates the values of computational effort for lawn sizes3Z,
48,64,8O and 96,with and without automatically defined functions.
Figure 8.37 shows the computational effort, Ewithou, md Ewith, for lawn
sizes of 32,48,64,80, and 96,bothwith and without automatically defined
functions. As can be seen, the relationship between the values of the computational effort, E without (i.e., L 9,00 0, 56,000, 100,000, 561,000, au-.rd 4,692,000) and
the lawn size is steep and nonlinear. The explosive growth of Erittout (spanning more than two orders of magnitude) as a function of problem size is
evident from the figure when automatically defined functions are not involved.
The graph applicable to automatically defined functions is visible on this figure only as a thickening of the horizontal axis. The rate of increase of E*,,0 is
dramatically less.
Figure 8.38 shows the same data as figure 8.37 using a logarithmic scale on
the vertical axis, thereby making the graph of E*uo visible.
Whenwe perform a linear regression on the five-point curvewithout automatically defined functions, we get a correlation of only 0.77 because of the
nonlinearity of this set of data. In particulat, the computational effort, Ewithout ,
canbe stated in terms of the lawn size, L, as
Ewithout - -2,855,000 +61,570L.
Figure 8.39 shows the poor fit between the acfual data for E*rro,,, and the
straightlineproducedbythe linearregression (dotted line) forthelawnmower
problem.
Chapter 8
32 48 96
10,000,000
1,000,000
E
100,000
10,000
1,000
32 48 64 80 96
Problem Size
Figure 8.38 Comparison of computational effort for lawn sizes of 32,48,64,80, and 96, with
and withoutADFs, with logarithmic scale.
5,000,000
a- Actual data
Linear regression line
E
2,500,000
Figure 8.39 Comparison of actual data for Ewithout and linear regression line for the
lawnmower problem without ADFs.
\A[hen we perform an exponential regression on the five-point curve without automatically defined functions, we find that the comPutational effort,
Ewithout, can be stated in terms of the lawn size, L, as
E.ithout = 944'2x 100'0362t'
with a correlation of 0.98. That is, an exponential is a better fit to the
observed data. The computational effort, Ewithout, without automatically
defined functions grows approximately exponentially with problem size
for this problem.
In contrast, the progression of values of computational effort, Ewith, with
automatically defined functions (5,000, 9,000,11,000 arrdl7,000 and 20,000) is
a nearly linear sequence for the problem sizes of 32,48,64,80, and 96.Infact,
when we perform a linear regression on the five-point curve with automatically defined functions, we find that the computational effort, E.ith, can be
expressed in terms of the lawn size, L, as
+ Without Defined Functions
With Defined Functions
Problem Size
267 The Lawnmower Problem
Ewirh = -2,800 +237.5L,
with a correlation of 0.99. The slope indicates that it takes about an additional
237.5 fibress evaluations for each additional square of lawn.
The scaling of E*itnout artd E*,,0 for this problem (and for the parity problem in section 6.15 and the bumblebee problem in section g.L3) provide
evidence in support of main point 6:
Main point 6: For the three problems herein for which a progression of
several scaled-up versions is sfudied, computational effort increases as a function of problem size at a lower ratewithautomatically defined functions than
without them.
8.16 WALLCLOCK TIME FOR THE LAWNMOWER PROBTEM
The question arises as to whether automatically detined functions arebeneficial in terms of the amount of elapsed time required to yield a solution (or
satisfactory result) to a problem.
Every adaptive algorithm starts with one or more points in the search space
of the problem and then iteratively performs the following two steps: measuring the fihress of the current point(s) and using the information about fitness to create new point(s) in the search space. The trajectory through the
search sPace, starting at the initial point(s) and ending at the final point(s), is
generally different for different algorithms.
The computational burden of an adaptive algorithm can be measured in
several different ways. Each measure has particular advantages and disadvantages. The measure, E , of computational effort (described in section 4.1I)
is the method that we have used so far in this book. E is the minimum rurnber of fibress evaluations required to get a solution (or satisfactory result)
with a specified, satisfactorily high probability (say 99%).
For genetic programming, we have demonstrated, for several problems,
that less computational effort, E ,rsrequired to solve the problem with automatically defined functions than without them, provided the difficulty of the
problem is above a certain breakeven point for computational effort (main
point 3). However, as previously mentioned, this measure treats all fitness
evaluations as if they were equally burdensome. It is conceivable, therefore,
that automattcally defined functions might be beneficial in terms of E ,but
not beneficial in terms of elapsed time (wallclock time).
We deferred the discussion of wallclock time to this chapterbecause we are
unable to compute wallclock time for the Boolean problems in this book in a
meaningful manner. The reason is that our implementation of the Boolean
problems is extensively optimized (as described in section 6.9) with the specific objective of converting programs of vastly different sizes and shapes
into programs that consume almost equal (and much less) wallclock time.
These optimizations produce a speedup of between one and two orders of
magnitude (e.9., I7:t for the even-S-parity problem with {4,41 as the argument
map for the automatically defined functions). The progression of even -pafity
268 Chapter 8
Table 8.12 Analysis of wallclock time for the 64-square lawnmower problem
with ADFs.
Generation Duration of
generation
Cumulative P(M,i)
elapsed time
R(M,i,z) W(M,i,z)
0
1
2
J
4
5
6
7
8
89
36
1.4
7
4
a
J
2
1
4,070.86
2,257.92
L,r37.08
694.12
461.24
383.55
277.46
146.73
problems in chapter 6 and the comparison of the L5 architectures in chapter 7
simply could not have been run in any reasonable amount of time without
these optimizations, so we did not have the luxury to forgo these optimizations.
The lawnmower problem in this chapter and the bumblebee problem in
the next chapter were specifically designed to run fast enough, without any
distorting optimizations, to permit a comparative study of wallclock time.
Measurement of wallclock time is performed by collecting timestamps at
the beginning of each run and at the end of each generation within the run.
If every run of genetic programming were successful in yielding a solution
(or satisfactory result), the wallclock time required to yield a solution (or satisfactory result) would be easy to measure. If success is guaranteed to occur/
the observed average wallclock time is simply the sum of the elapsed times
for all the runs in a series of runs divided by the number of runs. When a
particular run of genetic programming is not successful after rururing the
prespecified maximum number of generations, G, there is no way to know
whether or when the run would ever be successful. There is no knowable
value for the elapsed time required that will yield a solution (or satisfactory
result) and this simple averaging calculation cannotbe used. Measuring the
computationalburden in terms of wallclock time is similar to measuring the
computational burden in terms of E in that, in general, it requires a probabilistic calculation that accounts for the fact that not all of the n"rns in a series are
successful.
Table 8.12 shows an analysis of the wallclock time for a series of runs of the
64-square lar,rrnmower problem with automatically defined functions. Anew
series of 414 runs was made because the previous series of 76 runs (used to
make figure 8.26) did not contain timestamps for each individual generation.
Column 2 shows the average duratiofl, h seconds, for each generation.
Column 3 shows the cumulative elapsed time for the generations.
Column 4 states, as a percentage, the value of the observed cumulative
probability of success, P(M,i), for the 64-square lawnmower problem with
The Lawnmower Problem
30.25
15.49
1,6.98
18.50
t7.94
16.15
t2.54
10.88
8.00
30.25
45.74
62.72
8L.22
99.16
115.31
127.85
138.73
146.73
0.00%
5.07%
12.08%
28.74%
50.48%
71,.50%
83.57%
90.82%
100.00%
269
automatically defined functions for this series of 41.4 runs. The values of
P(M, i) fut this table of observed values are similar to (but, of course, slightly
different thutt) the values of P(M, r) obtained in the previous series (of 76runs)
used to make the perfonnance curves in figure 8.26. For example, p(M, i)
reached a value of 85.53% for generation 6 for the previous series of 76 runs
and83.57% for the series of 41,4runs. P(M, i) reached a value of 97.37% for
generation 8, 98.68o/" for generationg, artd100% for generation 10 in the previous series of 76 runs, whereas it reached 100% for generation 8 in the series
of 41,4 runs.
Column 5 shows the number of independent runs, R(M, i, e), required to
yield a solution to the problem with a satisfactorily high probability of
z =99oh associated with the value of P(M, j) in column 4.
Column 6 of table 8.12 shows W(M, i, z), the amount of wallclock time thnt
mustbe erpended in order to yield a solution (or satisfactory result) for a problem with a probabihty of z, for apopulatio n sze M, by generation i. w( M, i, z)
is measured in seconds.
Note that the time required to create the initial random population in generation 0 is included for generation 0 in the table. Because of this, the average
duration shown for generation 0 is about twice the duration for other early
generations for this particular problem.
br generation 1, table 8.12 shows that the observed cumulative probability
of success , P(M, i), is amere 5.07%. With this low observed cumulative probability of success, a total of R(M, i, z) = 89 indepencient runs are required to
solve this problem with a probabiltty of 99%. The average cumulative elapsed
time for a run to generation L rs4l.74seconds. Thus, the amount of computer
time, W(M, i, z), rcquired to yield a solution with 99% probability is 4,070.86
seconds (about 1,.1 hours) if this problem is run to generation L and abandoned.
For generation 6, the observed cumulative probability of success, P(M, i), is
83.57%.Consequently, R(M, i, e) is now only 3. The average cumulative elapsed
time for a run to generation 6 is 127.85 seconds. Thus, the amount of computer time, W(M, i, z), necessary to yield a solution is 383.55 seconds. (about
6.4 minutes) if this problem is run to generation 6 and abandoned.
On generation 8, the observed cumulative probability of success, P(M, il,
reaches 100o/o, so R(M, i, z) = L. The average elapsed time for one run to generation 8isI46.73 seconds, so the amount of computer time, W(M, r, e), necessary to yield a solution ts146.73 seconds (about 2.4 minutes) if this problem is
run to generation 8 and abandoned. Generation 8 is the best generation and
R(z) is L for generation 8.
We define thewallclocktimewithautomntically definedfunctions, Wwith,as the
minimum value, over the generations, of W(M, i, z) wlth ADFs. For the
64square lawnmower problem with ADFs, Wwin is t46.73 seconds.
Figure 8.40 contains the wallclock performance curves for the 64-square
lawnmower problem with auiomatically defined functions. This figure is
constructed in the same general way as all the other performance curves in
this book. The rising curve is the cumulative probability of success/ P(M,i).
270 Chapter 8
Table 8.13 Analysis of wallclock time for the 64-square lawnmower problem
withoutADFs.
Generation Duration of
generation
Cumulative P(M,i)
elapsed time
R(M,i,,z) W(M,i.,z)
0
1
2
3
4
5
6
7
8
9
t 0
11.
t2
13
L4
15
t6
17
18
t9
20
2L
22
23
24
25
26
27
28
29
30
31,
32
33
u
35
36
37
38
39
40
41
42
43
M
45
46
47
48
49
20.M
7.70
9.56
L3.26
17.M
19.63
20.33
27.93
u.96
36.37
41.26
M.56
42.19
47.?2
57.37
63.78
50.04
45.04
38.70
37.63
41,.63
52.19
53.89
47.77
46.60
54.48
48.05
69.41,
59.38
43.06
58.33
54.1,4
51.85
57.08
75.08
5L.17
36.67
32.91,
37.r0
40.50
36.50
22.88
29.60
24.75
64.67
73.00
72.00
43.33
34.00
32.67
20.M
28.15
37.70
50.96
68.41,
88.04
108.37
L36.30
171,.26
207.63
248.89
293.M
335.63
382.85
M0.?2
504.00
554.04
599.07
637.78
675.41,
7I7.04
769.22
923.L1
870.88
9L7.48
971..96
1.,020.0'J.
1.,089.42
1,,L48.80
L,L91.86
1.,250.19
1.,304.33
"1.,356.18
1,413.26
7,488.33
1,539.50
1.,576.17
1.,609.08
1,,646.18
1.,686.68 '1,723.18
1,,746.05
1,,775.65
1",800.40
1,865.07
1,938.07
2,010.07
2,053.40
2,097.40
2,120.07
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.61
3.68
19.63
25.1.5
33.74
35.58
35.58
39.88
46.01
51.53
51.53
5L.53
52.15
52.15
54.60
55.83
55.83
55.83
67.48
85.89
90.18
92.&
92.64
92.&
92.64
92.64
92.4
92.&
92.64
575,1.47
10t,243
19,159
1.4,680
11.,6&
11,220
\L,9U
1L,488
9,535
8,751.
9,r30
9,493
9,893
10,418
9,237
9,457
9,654
9,877
8433
5,r70
3,492
3,551
3,60'L
3,730
3,876
4,020
4,\07
4,175
4,240
,*
123
22
1.6
12
11
L1
t0
8
7
7
7
7
7
6
6
6
6
5
3
2
2
2
.,
L
2
1
L
2
2
2
The Lawnmower Problem
Na
. I
G
tsl
tj
z
?
l- Prr-l'l I
l* w(tut, ir) |
l-M = lpoo I
I z=99%o I
lR(z;=1 I
I N=414 |
^ 100
rS
a
a
()
I
I
t
-
ct)
b50
h
I
.-
-
.-
A
-
c!
A
-(
|i
A ,
-
With Defined FunctionS
(l,5Vo) 4
Generation
Figure 8.40 Wallclock-performance curves for the 64-square lawnmower problem with ADFs.
The second curve isW(M, i, z) fuomtable 8.12. The minimum value, W*i6, oI
1,46.73 for W(M, i, z) altatned at generation 8 is shown in the oval along with
the number of the generation (8) on which it is attained.
We now determine the wallclock time, Wwithout, without automatically
defined functions.
Table 8.13 shows an analysis of the wallclock time for a series of run of the
64-square lawnmower problem without automatically defined functions.
Column 4 presents the value of the observed cumulative probability of
success, P(M,i), for the 64-square lawnmower problem without automatically defined functions. These values are based on a separate series of 163
runs; these values are very similar to the values in figure 8.L3. As canbe seery
the average duration of a generationwithoutautomatically defined functions
is about the same for the first few generations of this table as in table 8.12 with
automatically defined functions; however, the durations grow considerably
for later generations of this table. Significantly, the observed cumulative probability of success , P(M ,i), is only 92.64%by generation 49 without automatically defined functions as compared to 90.82"h for generation 7 with
automatically defined functions (table 8.12).
For generation 49, R(M,i,e) is only 2 for table 8.13 (without automatically
defined functions). The average elapsed time for one run to generation 49 is
2,120 seconds (about 35 minutes), so the amount of computer time ,W(M, i, z),
necessary to yield a solution is 4,240 seconds (about TL minutes) if this problem is run to generation 49 and abandoned.
Based on table 8.13, the wallclock time r Wwithoutr without automatically
defined functions for the 64-square version of this problem is 4,240 seconds. This is 28.9 times longer than the wallclock time, Wwith, with automatically defined functions for this version.
W = 146.73
272 Chapter 8
32
48
64
80
96
6.13
10.4
28.9
68.5
1049.0
The wsllclock ratio, R*, is the ratio of the average wallclock time, wwithout r
without automatically defined functions to the average wallclock time , W *ith,
with automatically defined functions.
I4l without ADFs Wwithout
r\w -
IV with ADFs w with
For the 64-square lawnmower problem, the wallclock ratio, R*,is289.
Table 8.14 shows the wallclock ratio, Rsr, for the lawnmower problemwith
lawn sizes of 32,48,64,80, and 95. As can be seen, all five wallclock ratios are
considerably greater than 1, indicating that runs with automatically defined
functions require less wallclock time than the runs without automatically
defined functions for this problem.
h other words, automatically defined functions are beneficial both in terms
of computational effort, E, arrdwallclock time, W, for this problem.
A similar table appears in section 9.1,4 andshows that less wallclock time is
required with automatically defined functions than without them for all four
sizes of the bumblebee problem. Wallclock time is revisited in section l-0-2-
Thble 8.14 Wallclock ratios, Rr, for the lawnmower problem.
Problem size Wallclock ratio P,
273 The Lawnmower Problem
The Bumblebee Problem
This chapter examines a problem in the domain of floating-point numbers
that was especially constructed to permit the study of scaling. The goal is to
find a program for controlling the movement of a bumblebee so that it visits
all the locations in the plane containing flowers. The bumblebee problem is
scaled in terms of the number of flowers to be visited. Four progressive$
more dfficult versions of this problem will be run, each with and without
automatically defined functions.
The bumblebee problem provides another example of a problem in the
domain of floating-point numbers.
9.'1, THE PROBLEM
The location of each flower is specified by a two-dimensional vector of
floating-point coordinates. The bee starts at the origin (0.00,0.00). The
x-location of a flower is a randomly chosen floating-point number between
-5.00 and +5.00; the y-location is also a randomly chosen floating-point
number between -5.00 and +5.00. No flower can be within the square of
side 0.02 centered on any other flower or within the square of side 0.02
centered at the origin. The number of flowers ts 25,20, \5, and 10 in the
four versions of the problem.
9.2 PREPARATORY STEPS WITHOUT ADFs
The terminal set for this problem consists of vectors with floating-point components. Specifically,
7- { BEE, NEXT-FLOWER, frreal-vector}.
BEE is the current location of the bumblebee in the plane expressed as a
two'dimensional vector of floating-point values.
NEXT-FLOWER is a terminal that is set to the position of a randomlychosen
unvisited flower belonging to the current fitress case.
Each random constant frreal-vect.1 consists of a vector (*,y), each comPonent of which is a floating-point value between -5.0000 and +5.0000.
a
oo
oa
oo
I
"t
o
lo
a
t
o
tt:
o
Oa
I
ao
a
a
o
o
o
Figure 9.1 TWo fitness cases for the bumblebee problem with 25 flowers.
The function set consists of
y= {V+,V-, GO-X, GO-y, PROGN}
with an argument map of
{2,2,I,1,21.
v+ and v- are two-argument functions for floating-point vector addition
and subtraction.
Go-x takes a single vector as its argument and moves the bee the distance
in the "r-direction specified by the.r-component of its vector argument. co-X
alwaysretums (0.0, O. O).
cO-Y operates in a similar way in they-direction.
Because this problem is time-consuming and because we need multiple,
successful runs for all eight versions of this problem to do the desired
analysis, we compromised on the number of fitness cases by allocating
only enough computer time to this problem to support two fitness cases
for each run.
Figure 9.1 shows the two fitness cases for this problem for a run with 25
flowers.
Each program is evaluated once for each fitness case. The raw fitness of
a particular program is the sum, over the two fitness cases, of the number
of flowers visited by the bumblebee. If there are 25 flowers, raw fitness
varies between 0 and 50. The bee is deemed to have reached a flower when
it enters the square of side 0.02 centered on the flower. We use a square
rather than a circle because less computer time is required to compute the
bee's arrival within the square. If the bee reaches a flower, the bee is credited with visiting it regardless of whether the flower is the one designated
by wrxr-FlowER. The bumblebee is limited to L00 movements per fitness case and it receives credit for all flowers visited in the current fitness
case when this limit is reached.
Thble 9.L summarizes the key features of the bumblebee problem with 25
flowers without automatically defined functions.
276 Chapter 9
Thble 9.1 Thbleau withoutADFs for the bumblebee problem with 25 flowers.
Objective: Find a program to control a bumblebee so that it visits
all25 flowers in the Plane.
Terminal set
without ADFs:
BEE, NEXT-FLOWE& and the random constants
Sr*l-rr*or'
Function set
without ADFs:
V+, V-, Go-X, GO-Y, and PnoGN.
Fitness cases: Two fitness cases, each consisting of 25 randomly
chosen vector locations in the plane.
Raw fitness: Raw fitness is the sum, over the two fitness cases, of the
number of flowers (from 0 to 50) visited before the
maximum number of movements per fitness case is
exceeded.
Standardized fitness: Standardized fitness is twice the number of flowers
(i.e.,50) minus raw fitness.
Hits: Same as raw fitness.
Wrapper: None.
Parameters: M=4,W.G=51.
Success predicate: Aprogram scores the maximum number of hits.
g.g RESULTS WITH 2s FLOWERS WITHOUT ADFs
The following 525-point program visiting all25 flowers in both fitness cases
emerged on generation 37 of one run:
(v+ (PRoGN (V+ (PROGN (V- (Go-X (Go-XNEXT-FLOWER) ) (v- (PROGN
(GO-Y (PROGN (V- (GO-Y NEXT-FLOWER) (GO-X NEXT-FLOWER) ) (PROGN
(GO-X (V- NEXT-FLOWER BEE) ) (V- NEXT-FLOWER BEE) ) ) ) (V- (V- (C'O-Y
(V- NEXT-FLOWER BEE) ) (V+ BEE (-3 .4?51, 4 .01-23 ) ) ) (PROGN (PROGN
(co-x (v- NEXT-FLOWER BEE) ) (V- BEE NEXT-FLOWER) NEXT-FLOWER) )
(Go-x (Go-Y (V- NE{T-FLQWER BEE) ) ))) (v- (PROGN (V+ (PRQGN (V+
(GO-X NEXT-FLOWER) NEXT-FLOIVER) (V- (PROGN (PROGN (V- (GO-Y NEXTFLOWER) (V+ BEE (-3.4?51 ,4.0L23)) ) BEE) (GO-X (GO-Y (V- NEXTFLOWER BEE)) ) ) (PROGN (PROGN (GO-X (V- NEXT-FLOWER BEE) ) (V- BEE
NEXT_FLOWER) ) NEXT_FLOWER) ) ) (PROGN (V- (C'o-Y (PROGN (GO-X (VNEXT_FLOWER BEE) ) (V- NEXT-FLOWER BEE) ) ) (GO-Y NEXT_FLO\^IER) ) (GOx NEXT-FLOWER) )) (V- (PROGN (PROGN (V+ BEE (GO-Y (V- NEXT-FLOWER
BEE) )) BEE) (GO_X NEXT-FLOWER) ) {PROGN (PROGN (GO-X (V- NEXTFLOWER BEE) ) (V- BEE NEXT-FLOWER) ) NEXT_FLOWER) ) ) (GO-X (GO_Y (VNEXT_FLOWER BEE) ))) ) (PROGN (V- (GO-X (PROGN (PROGN (GO-X (VNEXT-FLOWER BEE) ) (GO-Y (V- NEXT-FLOWER BEE) ) ) NEXT-FLOWER)
(PROGN (V- NEXT-FLOWER BEE) (PROGN BEE NEXT-FLOWER) ) (GO-Y (PROGN
(GO*X (V- NEXT-FLOWER BEE) ) (V- NEXT-FLOWER BEE) ) ))) (PROGN (V-
(PROGN (V+ (GO-X NEXT-FLOWER) (V+ (GO-X (GO-Y (V- NEXT-FLOWER
BEE) )) (V+ (PROGN (GO-X NEXT-FLOWER) (V+ (0.55423,4.9729) (GO-X
277 The Bumblebee Problem
BEE)) ) (v+ (co*v (co-x (v- unxr-FlowER BEE) ) ) (co-x (co-x (vNEXT_FLOWER BEE) ) )) ) ) ) (V- (V- (GO-Y NEXT-FLOWER) (GO-X NEXT_
FLOWER) ) (PROGN (PROGN (GO-X (V- NEXT_FLOWER EEN) ) (V- BEE NEXTFLOWER) ) NEXT_FLOWNN) ) ) (GO-X (GO-Y (V- NEXT-FLOWER BEE) ) ) )
(PROGN (co-x (PRoGN (v+ (co-x NEXT-FLOWER) (v- NEXT-FLOWER BEE) )
(V_ (PROGN (GO*Y (PROGN (V_ (GO-Y NEXT_FLOWER) (GO-X NEXT-FLOWER) )
(PROGN (GO-X (V- NEXT-FLOWER ENN) ) (V- NEXT-FLOWER BEE) ))) (V-
(GO-X (GO-Y (V* NEXT-FLOWER BEE) )) (PROGN (PROGN (GO-X (V- NEXTFLOWER BEE) ) (V- BEE NEXT_FLOWER) ) NEXT-FLOWER) )) (GO-X (GO-Y (V_
NEXT-FLOWER BEE) ) ) ) ) ) (co-x BEE) ) ) ) (pRoGN (V+ (pRocN NEXr_FLOWER
NEXT-FLOWER) (PROGN (V- (PROGN (GO_Y (PROGN (PROGN (PROGN (GO-X
(V- NEXT-FLOWER BEE) ) (V_ BEE NEXT_FLOWER) ) NEXT_FLOWER) (PROGN
(GO-X (V_ NEXT-FLOWER BEE) ) (V_ NEXT-FLOWER EBB) ))) (V- (V_ (GO_Y
NEXT-FLOWER) (V+ BEE (-3.475L,4.0123))) (PROGN (PROGN (V+ BEE (GO_
Y (V- NEXT_FLOWER BEE) ) ) BEE) NEXT-FLOWER) ) ) (PROGN (PROGN (GO-Y
(PROGN (PROGN (PROGN (GO-X (V- NEXT_FLOWER BEE) ) (V_ BEE NEXTFLOWER) ) NEXT-FLOWER) (PROGN (GO_X (V_ NEXT_FLOWER BEE) ) (V_ NEXTFLOWER BEE) )) ) (v- (v- (co-y NEXT-FLOWER) (V+ BEE
(-3.475I,4.0t23))) (PROGN (pRocN (V+ BEE (co-y (V- NEXT-FLOWER
BEE) ) ) BEE) NEXT-FLOIdER) ) ) (PROGN BEE NEXT-FLOWER) )) (V- (V- (GO-X
NEXT_FLOWER) (V+ (PROGN (Go_Y NEXT_FLOWER) (PROGN (V+ (V+ (Go-X
(GO-X BEE) ) (V- 'BEE
NEXT-FLOWER) ) (GO-Y (GO_X (V- NEXT-FLOWER
BEE) ))) (v- NEXT-FLOWER BEE) ) ) (v+ BEE (Go-y (v- NEXT-FLOWER
BEE))))) (PRocN (pRocN (v- NEXT-FLOWER BEE) (v- (pRocN (pRoGN (v+
BEE (GO-Y (V- NEXT_FLOWER BEE) ) ) ENN) (GO-X NEXT-FLOWER) ) (PROGN
(PROGN (GO-X (V- NEXT*FLOWER BEE) ) (V- BEE NEXT_FLOWER) ) NEXT_
FLowER) )) (co-Y (PRocN (co-y NEXT-FLOWER) (pRoGN (v+ (V+ (co-x
(GO-X BEE) ) (GO*X NEXT_FLOWER) ) (GO_Y (GO-X (V- NEXT-FLOWER
BEE)))) (v-NEXr-FLOWERBEE) ))))))) (pRoGN (v- (pRoGN (v+ (co-x
NEXT-FLOWER) (V+ (PROGN (PROGN (1. 51j_37,L.49552) NEXT*FLOWER)
(GO-Y NEXT-FLOWER) ) (V+ (PROGN (cO-X NEXT-FLOWER) (V+
(0.55423,4.9'729) (co-X BEE) )) (V+ (co-y (co-x (co-y (V- NEXrFLOWER BEE) ) )) (GO-X (GO_X (V- NEXT_FLOWER BEE) ) ) ) ) )) (V_ (V- (GOY NEXT-FLOWER) (v+ NEXT-FLOWER (-3.4i5r,4.0L23) ) ) (pRocN (pRocN
(GO_X (V- NEXT_FLOWER BEE) ) (V- BEE NEXT-FLOWER) ) NEXT-FLOWER) ))
(Go-x (Go-Y (v- NEXr-FLOWER BEE) ) ) ) (pRocN (co-x (pRoGN (v+ (co-x
NEXT-FLOWER) (V- NEXT-FLOWER BEE) ) (V- (PROGN (GO-Y (PROGN (V_
(GO-Y NEXT_FLOWER) (GO_X NEXT-FLOWER) ) (PROGN (GO_X (V_ NEXT_
FLOWER ENN) ) (V- NEXT-FLOWER BEE) ))) (V- (V_ (GO-X NEXT-FLOWER)
(V+ BEE (_3.4751,4.0L23))) (PROGN (PROGN (Go-X (V_ NEXT_FLOWER
BEE)) (V- BEE NEXT_FLOWER) ) NEXT-FLOWER) )) (GO-X (GO-Y (V_ NEXTFLOWER BEE)))))) (GO_X BEE)))))
Figure 9.2 shows, for one of the two fihress cases, the hajectory of the bumblebee as it visits alI25 flowers under the control of the above best-of-run program from generation 37 without automatically defined functions.
The average strucfural complexi V S.i,nou,, of the best-of-run progremrs from
the 27 successftrl runs (out of 34 runs) of the bumblebee problem with 25
flowers is 452.0 points without automatically defined functions.
278 Chapter 9
Figure g.2 Trajectory of bumblebee visiting 25 flowers without ADFs.
For the bumblebee problem with 25 flowers, figure 9.3 presents the performance curves based on the 34 runs of this problem without automatically defined functions. The cumulative probability of success/ P(M,i), is
6%by generation2S andis7g%by generation 50. The two numbers in the
oval indicate that if this problem is run through to generation 50, Processing a total of E*urout = 612,000 individuals (i.e.,4,000 x 5L generations x 3
runs) is sufficient to yield a satisfactory result for this problem with
99% probability.
9.4 PREPARATORY STEPS WITH ADFs
In applyng genetic prograrnming with automatically defined functions to
the bumblebee problem, we decided that each overall program in the PoPulation would consist of one one-argurnent automatically defined function and
one result-producing branch.
The terminal set, tady,for alFO is
,Tadf = {ARG0, BEE, frreal-vector}.
The function set, faay,for aDFO is
fadf = {v+, v-, Go-x, Go-Y, PROGN}
with an argument map of
{2,2, r, 1,2} .
The body of anr0 is a composition of primitive functions from the function set, fadf,andterminals from the terminalset,'Toyy
The terminal set, trpb, for the result-producing branch is
t pb= {eue, NEXT-FLOWER, frreal-vector}.
The function set, frpb, for the result-producing branch is
frpb= {ADF0, V+, V-, GO-X, GO-Y, PROGN}
279 The Bumblebee Problem
l- P,M'D I
l-G I(M, i, z) |
l-M = 4pool
I z=99vo I
I R(z)=l I
I N=34 |
^ 1
a
a
q)
(J
I
!at
rh
Crr
h
+) .-
-
.-
ar
-
A
I'i
A -
-
Without Defined Functions
10,000,000
79Vo)
5,000,000
0
Figure 9.3 Performance curves
Ewithout = 612,000 without ADFs.
Generation
for the bumblebee problem with 25 flowers showing that
with €u:r argument map of
{1,2,2,l,1,2}.
The result-producing branch is a composition of the functions from the
fu1.l* set, fyo6, and terminals from the terminal set, t7pb.
Table 9.2 summarizes the key features of the bumblebee problem with 25
flowers with automatically defined functions.
9.5 RESUXTS WITH 2s FLOWERS WITH ADFs
hr one run of the bumblebee problem with 25 flowers with automatically
defined functions, the following 100%-correct2l9-point program scoring 50
(out of 50) emerged in generation L8:
(progn (defun ADFO (ARG0)
(values (Go-X (v+ (co-y (v- ARGO BEE) ) (v- ARGO BEE) ))))
(values (v- (PROGN (V- (PROGN (V- (v- (PROGN (V- (cO-X
NEXT-FLOWER) (GO_Y NEXT_FLOWER) ) NEXT_FLOWER) (V- NEXT_
FLOWER (ADF0 NEXT-FLOWER) )) (V- NEXT-FLOWER (V+ (ADFO
(PROGN (V_ (GO-X NEXT-FLOWER) (GO-Y NEXT-FLOWER) ) NEXT_
FLOWER) ) (v- (V+ (V+ (co-y NEXr-FLOWER) (pRocN (co-y
NEXT-FLOWER) (ADFO NEXT-FLOWER) ) ) (ADFO (V+ (cO-y NEXTFLOWER) (V- NEXT-FLOWER BEE) ))) (V- NEXT_FLOWER (ADFO
NEXT-FLOWER) ))))) (V_ (PROGN (V_ (cO_X NEXT_FLOWER)
(GO_Y NEXT-FLOWER) ) (V- NEXT-FLOWER (V- NEXT_FLOWER
(ADFO (V+ (ADF0 NEXT-FLOWER) (V- NEXT-FLOWER (ADF0 NEXTFLOWER) )))))) (v- (v- (pRoGN (v_ (v_ (PROGN (v_ (co_x
NEXT_FLOWER) (GO-Y NEXT-FLOWER) ) XPXT_FLOWER) (V_ NEXTFLOWER (ADFO NEXT-FLOWER) ) ) (GO-Y NEXT_FLOWER) ) NEXT-
€
O
a
U, q)
I
,
A . E
q)
Fl -
+a
C,)
-
c!
!t
-l
o -
a -
-
v
F.
I
\
(s0,
50
25 50
280 Chapter 9
Thble 9.2 Tableau with ADFs for the bumblebee problem with 25 flowers.
Objective: Find a program to control a bumblebee so that it visits
all25 randomlY located flowers.
Architecture of the
overall program
with ADFs:
One result-producing branch and one one'argument
function-defining branch'
Parameters: Branch typing.
Terminal set for the
result-producing
branch:
BEE, NEXT-FLOWER, and the random constants
frr=ul-.re.tor'
Function set for the
result-producing
branch:
ADFO, V+, V-, GO-X, GO-Y, and PnOCU.
Terminal set for the
function-defining
brarrch ADFO:
ARGO, BEE, and the random constants Sreat-vector.
Function set for the
function-defining
branch ADFo:
Y+,Y-,GO-X, GO-Y, and pRocttt.
FLOWER) (v- (V+ (v- (v- (Go-X NEXT-FLOWER) (Go-Y NEXrFLOWER) (alFo NEXT-FLOWER) (V+ NEXT-FLOWER (PROGN
NEXT-FLOWER BEE) ) ) (V- NEXT-FLOWER (ADFO NEXTFLOWER) ))) (V- NEXT-FLOWER (ADFO NEXT-FLOWER) )))) (VNEXT-FLOWER (V+ (ADFO NEXT-FLOWER) (v- (v+ (V+ (GO-Y
NEXr-FLOWER) (PROGN (GO-Y (GO-Y (GO-X BEE) )) (ADFO NEXTFLOWER) )) (ADFO (V+ (GO-Y NEXT-FLOWER) (V_ NEXT-FLOWER
BEE) ) )) (V- NEXT-FLOWER (ADF0 NEXT-FLOVilER) ))) ) (GO-x
(v+ (ADF0 (PROGN (V- (GO-X NEXT-FLOWER) (V- NEXT-FLOWER
(v+ (PROGN (PROGN NEXT-FLOWER BEE) (ADF0 NEXT-FLOWER) )
(V- (ADFO NEXT-FLOWER) (V_ (ADFO NEXT_FLOWER) (GO-Y
NEXr*FLOWER) ))))) NEXr-FLOWER) ) (V- (ADFO (V+ (ADFO
NEXT-FLOWER) (V- NEXT-FLOWER (ADF0 (V+ (ADFO NEXTFLOWER) NEXT-FLOWER) ) ) ) ) (ADFO NEXr-FLOWER) ) ) ) (VNEXT-FLOWER (V- (PROGN (V- (GO-X (ADFO (V- NEXT-FLOWER
BEE) )) (GO-Y NEXT-FLOWER) ) (ADFO (V+ (ADF0 NEXT-FLOWER)
(V- NEXT-FLOI/IER (ADFO (V+ (ADF0 NEXT-FLOWER) NEXT*
FLOWER) ))))) (v- (ADFO NEXT-FLOWER) (V+ (V- NEXT-FLOWER
(ADF0 NEXr-FLOWER) ) (GO-x NEXr-FLOWER) ) ) ) ) ) ) ) -
hr this program, ADFO moves thebee in the x-directionby the difference of
ARGO and enn and then moves the bee in the y-direction by the difference of
ARGO and enn.
Figure 9.4 shows, for one of the two fihness cases, the trajectory of the bee
visiting the 25 flowers for this 2L9-point program with automatically defined
functions.
281 The Bumblebee Problem
Figure 9.4 Trajectory of bumblebee visiting 25 flowers withADFs.
The average strucfural complextty, S.itn, of best-of-run programs from the
3L successful runs (out of 31 runs) of the bumblebee problem with 25 flowers
with automatically defined functions is24s.9 points.
In comparing the solutions obtained with and without automatically
defined functions, it is obvious that the 525-point solution without automatically defined functions shown in section 9.3 (which is reasonably close to the
average size of 452.0 points) is much larger than the 219-point solution with
automatically defined functions (which is reasonably close to the average size
of 245.9 points).
For the bumblebee problem with 25 flowers, figure 9.5 presents the performance curves based on the 3L runs of this problem with automatically
defined functions. The cumulative probability of success, p(M,i ), is 1.00% by
generation 47. The two numbers in the oval indicate that if this problem is
run through to generation 7,processing a total of Ewith = Ig2,}}}individuals
(i.e.,4,000 x 48 generations x 1 run) is sufficient to yield a satisfactory result
for this problem with 99"/" probability.
Since thebee ought tobe able toperform somekind of generahzed,calculation in decidinghow tonavigate toward the nextflorveq, there is considerable
regularity and symmetry in this problem environment.
It is certainly not obvious from examining the bumblebee's trajectory in
figure 9.4 that automatically defined functions have successfully exploited
the considerable regularity of this problem environment. h fact, the overall
impression created by figure 9.4for the case with automatically ciefined functions does not appear to be fundamentally different from the tangled and
disorderly aPPearance of figure 9.2 for the case without automatically
defined functions. Howevel even though it is not visually obvious from the
trajectory that automatically defined functions have successfi;lly exploited
the considerable regularity of this problem environment, there is evidence
282 Chapter 9
With Defined Functions
500.000
(l5,3%o) Generation
Figure 9.5 Performance curves for the bumblebee problem with 25 flowers showing that
E with = L92,000 with ADFs'
that they have done so in the form of the two perforrnernce curves. When one
sees the difference in computational effort of 612,000 versus !92,000, the
advantageous effect of automatically detined functions is unmistakable. For
this problem, the statistics provide the means for seeing that the evolved proSrams employing automatically defined functions succeed in exploiting the
problem environment in a different and better way than the evolved programs not employtrlg automatically defined functions. The human observer
is often not able to understand or visualize how automatically defined functions exploit the problem environment.
Table 9.3 compares the average structural complexity, S.ttno6 drrd Swrth,
and the computational effort, E*,,oou, artd Ewith, with automatically
defined functions and without them for the bumblebee problem with
25 flowers.
Figure 9.6 summarizes the information in the table for the bumblebee
problem with 25 flowers and shows a structural complexity ratto, Rs, of 1.84
and an efficiency ratio, Ru, of 3.20.
9.6 RESULTS WITH 20 FLOWERS WITHOUT ADFs
We then scaled this problem down so that only 20 flowers are visited for each
fitness case.
The average structural comple*ity, S, of the best-of-run programs from the
35 successful runs (out of 36 runs) without automatically defined functions is
386.9 points for the bumblebee problem with 20 flowers.
For the bumblebee problem with 20 flowers, figure 9.7 presents the performance curves based on the 36 runs of this problem without automatically defined functions. The cumulative probability of success, p(M,i),is
-a
q)
a
a
(u
I
L
A
H
q)
A
-
€
ch
-
t
!l
-
U
a -
-
FI
FI
-l
l- ptvr,tl
l+ (vt, i, z) |
I M = 4pool
I z=99Vo I
lR(z)=t I
I N=31 |
47 E = 192.000
283 The Bumblebee Problem
Thble 9.3 Comparison table for the bumblebee problem with 25 flowers.
WithoutADFs WithADFs
Average strucfural
complexity ,S
Computational effort E
452.0
612,000
245.9
L92,000
S
250
800,000
E
400,000
0
Without ADFs With ADFs WithoutADFs WithADFs
Figure 9.5 Summary graphs for the bumblebee problem with 25 flowers.
92%by generation 40 and 97%by generation 50. The two numbers in the
oval indicate that if this problem is run through to generation 40, processing
a total of E*ithout = 328,000 individuals (i.e., 4,000 x 41, generations x 2
runs) is sufficient to yield a satisfactory result for this problem with
99'/" probability.
9.7 RESULTS WITH 20 FLOWERS WITH ADFs
For the bumblebee problem with 20 flowers, the average strucfural complexity, S ,of the best-of-run programs from the37 successful runs (out of 38) with
automatically defined functions is 225.0 points.
For the bumblebee problem with 20 flowers, figure 9.8 presents the performelnce curves based on the 38 runs of this problem with automatically defined
functions. The cumulative probability of succe ss, P(M , i) ,is92"/'by generation
32 and 97%by generation 50. The two numbers in the oval indicate that if this
problem is run through to generahon32,processing a total of E*ith =264,000
individuals (i.e.,4,000 x 33 generations x 2 runs) is sufficient to yield a satisfactory result for this problem with 99% probabilify.
For the bumblebee problem with 20 fl owers, table 9.4 compares the average
structuralcomplexi}, Sri,ton and Swithrandthecomputationaleffort, E*ithout
and E.ur, with automatically defined functions and without them.
Figure 9.9, which summarizes the information in the table for the bumblebee problem with 20 flowers, shows a structural complexity ratio, Rs, of I.72
and an efficiency ratio, Ru, of 1.24.
9.8 RESUITS WITH l,s FLOWERS WITHOUT ADFs
We then further scaled this problem down to only 15 flowers.
Chapter 9
d
q) (n(nq)
I
t{ A
H
O
A
*.)
0
-
CB
-
-a
EA
ht
Fl
-. I
a
0
q)
I
9
rl
0
CH
+. .-
-
.-
-
c!
A
-
L
A .
-
Without Defined Functions
2,500,000
(I8,3Vo) Generation
Figure 9.7 Performance curves for the bumblebee problem with 20 flowers showing that
Ewithout = 328,000 withoutADFs.
With Defined Functions
,000
(50,97Vo)
,000
(I4,5Vo) Generation
Figure 9.8 Performance curves for the bumblebee problem with 20 flowers showing that
E with = 264,000 with ADFs'
Table 9.4 Comparison table for the bumblebee problem with 20 flowers.
Without ADFs WithADFs
€
I
(n
o
q)
I
L
A .
-
q)
-
+)
o
-
cg-
-
-
.-
.-
rl
t
I
-
-. 1
\\V
a
a
q)
I
I
)
o
CH
ia.-
-
.-
-
cg
-
k
A .
-
Average strucfural
complexity S
Computational effort E 328,000
386.9
285 The Bumblebee Problem
264,000
s
zffi
Without ADFs With ADFs WithoutADFs WithADFs
Figure 9.9 Summary graphs for the bumblebee problem with 20 flowers.
The average structural complexity, S, of the best-of-run programs from the
35 successful runs (out of 35 runs) without automatically defined functions is
328.4 points for the bumblebee problem with 15 flowers.
For the bumblebee problem with L5 flowers, figure 9.1"0 presents the performance curves based on the 35 runs of this problem without automatically
defined functions. The cumulative probabili W of success, P ( M, i), is 100%by
generation 39. The two numbers in the oval indicate that if this problem is
run through to generation 39, processing a total of Ewrthout = 160,000 individuals (i.e., 4,000 x 40 generations x 1. run) is sufficient to yr"Id a satisfactory
result for this problem with 99'/"probability.
9.9 RESULTS WITH 1s FLOWERS WITH ADFs
For the bumblebee problem with L5 flowers, the average strucfural complexity, S, of the best-of-run programs from the 50 successful runs (out of 50 runs)
with automatically defined functions is 190.8 points.
For the bumblebee problem with 15 flowers, figure 9.11 presents the performance curves based on the 50 runs of this problem without automatically
defined functions. The cumulative probability of success, P(M,i ), is 1007o by
generation 32. The two numbers in the oval indicate that if this problem is
run through to generationS2,processing atotal of E*uo - t32,000 individuals
(i.e.,4,000 x 33 generations x 1 run) is sufficient to yield a satisfactory result
for this problem with 99% probability.
For the bumblebee problem with 15 flowers, table 9.5 compares the
average structural complexit!, Swnhs6 &rrd Swithr and the computational
effoft, Ewithout and E*,,0, with automatically defined functions and without them.
Figure 9.12,which summarizes the information in the table for the bumblebee problem with 15 flowers, shows a strucfural complexity ratio, Rs, of I.72
and an efficiency ratio, Rr, of I.21,.
9.70 RESULTS WITH 10 FLOWERS WITHOUT ADFs
Finally, we scaled this problem down to only L0 flowers.
286 Chapter 9
-
q) (n
a
o)
I
li
A .
-
q)
A
-
.|J
v2
-
cg
!t
-
rl
.-
.-
F
F.
I
H
CA
rnq)
I
I
!a
J
a
eE <n
rlra
o l
-
. l
-
-.
l.l
A ,
-
Without Defined Functions
(I5,3Vo) Generation
Figure 9.10 Performance curves for the bumblebee problem with L5 flowers showing that
Ewithout = 160,000 withoutADFs.
With flefined Functions
1,000,000
\
(50, 1007o)
(r2'6vo) Generation
Figure 9.11 Performance curves for the bumblebee problem with L5 flowers showing that
E wi th = 132,000 with ADFs'
Thble 9.5 Comparison table for the bumblebee problem with L5 flowers.
Without ADFs WithADFs
-' I()(t)
a
()
I
tr
A
H
q)
A
t-
*)
a
-
A
-
-
rl
.-
. l€/
I
f-
--. 1
- 6
0
0
q)
cJ
cJ
-
-
U)
CH
J
-
.-
-
cg
A
L
A .
-
Average strucfural
complexity S
Computational effort E 160,000
328.4 190.8
132,000
287 The Bumblebee Problem
s
zffi
Without ADFs With ADFs Without ADFs With ADFs
Figure 9.12 Summary graphs for the bumblebee problem with L5 flowers.
The average stmctural complexity, S, of the best-of-run programs from the
35 successful runs (out of 35 runs)without automatically defined functions is
224.2points for the bumblebee problem with L0 flowers.
For the bumblebee problem with 10 flowers, figure 9.13 presents the performance curves based on the 35 runs of this problem without automatically
defined functions. The cumulative probability of success, P(M,i),is100"/"by
generation 28. The two numbers in the oval indicate that if this problem is
run through to generation2S,processing atotal of Eri,rrou, = 116,000 individuals (i.e., 4,000 x 29 generations x L run) is sufficient to yield a satisfactory
result for this problem with 99% probability.
g.lt RESULTS WITH L0 FLOWERS WITH ADFs
For the bumblebee problem with 10 flowers, the average structural complex1ty, S ,of the best-of-run programs from the 33 successful runs (out of 33 runs)
with automatically defined functions is 150.9 points.
For the bumblebee problem with L0 flowers, figure 9.14 presents the performance curves based on the 33 runs of this problem with automatically
defined functions. The cumulative probability of success, P(M,i ), is 100% by
generation 23. The two numbers in the oval indicate that if this problem is
run through to generation23,processing a total of E*u, = 96,000 individuals
(i.e.,4,000 x 24 generations x 1 run) is sufficient to yield a satisfactory result
for this problem with 99"/" probability.
For the bumblebee problem with 1"0 flowers, table 9.6 compares the average strucfural complexitf, Swithour and Swith, and the computational effort,
Ewithout artd E*ro, with automatically defined functions and without them.
Figure 9.15,which summarizes the information in the table for thebumblebee problem with L0 flowers, shows a stmctural complexity ratio , Rs, of 1.49
and an efficienry ratio, R", of T.20.
9.12 SUMMARY FOR IO,\5,20, AND 25 FLOVVERS
Table 9.7 compiles the observations from the above experiments into one
table. As can be seen, for the bumblebee problem, the efficiency ratio, RB,
is always greater than 1 (indicating that fewer fitness evaluations are
required to yield a satisfactory result for the problem with 99"/"probability
288 Chapter 9
^ 100
\\v
a
a
o
9
I
FI
,
a
tsso
h
9
.-
-
(g
A
li
A ,
-
0
-
v
q)
U1 (n€)
I
Lr
!
q)
-
ir,
(t)
ra
rl
J
-. U
. E
Ea1E
,--. 1
a
O) o) O
c,
-'-
a
CH
h
I
. l
-
. A
A
-
G
-
tr
A
-
Without Defined Functions
(12'3vo) Generation
Figure 9.L3 Performance curves for the bumblebee problem with L0 flowers showing that
Ewithout = L1'6,000 withoutADFs'
With Defined Functions
1.000.000
(lo'3vo) Generation
Figure 9.1,4 Performance curyes for the bumblebee problem with 10 flowers showing that
Ewith = 96,000 withADFs'
Thble 9.6 Comparison table for the bumblebee problem with 10 flowers.
Without ADFs WithADFs
2,000,000 6
Nq) \u) \ a\c)
50,100Vo) t
L
A .
E
c) ^r -
1,000,000 s
a
t
GI
-
J
€. I
'-
i
v
-l
-l
f0
l- p,M$ I
l+ I(M' i, z)l
I M = 4pool
ffi_l
\
(23. looTo)
F p,Mtl
| +- r(M. i. z) |
Average strucfural
complexity S
Computational effort E
224.2
116,000
150.9
96,000
289 The Bumblebee Problem
S
125
Without ADFs With ADFs WithOUtADFS WithADFS
Figure 9.15 Summary graphs for the bumblebee problem with 10 flowers.
with automatically defined functions than without them). The structural
complexity rctio, R5, is also always greater than 1 (indicating that the overall size of the solutions to the problem is smaller with automatically
defined functions than without them).
9,13 SCALING WITH 1:O,1,5,2O, AND 25 FLOWERS
This section examines the average strucfural complexity, S,and the computational effort, E, as a function of problem size for the burnblebee problem.
We first consider the average strucfural complexity, S, of the genetically
evolved solutions to the bumblebee problem, with and without automatically defined functions.
Thble 9.8 consolidates the previously reported values of the average structural complexity for 10,15,20, and 25 flowers, with and without automatically defined functions for the bumblebee problem.
Figure 9.16 shows the average structural complexiry S, of solutions for L0,
15,20,and 25 flowers, with and without automatically defined functions. The
graphs are approximately linear, both with and without automatically
defined functions; however, they are different.
When we perform a linear least-squares regression on the four points relating to the runs without automatically defined functions, we find that the strucfural complexitf, Srttnout, canbe stated in terms of the number of flowers ,F, as
Swithout = 88.21 + I4.84F,
with a correlation of 0.99. The slope is L4.M, so it takes an average of 14.U
points in the program tree to handle each additional flower.
Lr contrast, when we perform a linear regression on the runs with automatically defined functions, we find that structural complexitf, Swith,canbe
stated in tenns of the number of flowers, F, as
Swith =90.26 + 6.44F,
with a correlation of 0.99. The vertical intercept of 90.26 here is only slight$
larger than the intercept (88.21) without automatically defined functions.
Howeve{, the slope of 6.44 here is only 43% of the slope (14.84) without
290 Chapter 9
Table9.7 Summary table of the structural complexity ratio, ft5, and the efficiency
ratio, R6,for the bumblebee problem with 10,15,20, and 25 flowers.
Problem Structural
complexity ratio Rt
Efficiency ratio Rt
Bumblebee - L0 flowers
Bumblebee - 15 flowers
Bumblebee - 20 flowers
Bumblebee-25 flowers
I.49
1.72
1.72
1.84
1.20
1.21,
1,.24
3.20
Table 9.8 Comparison of the average structural complexity of solutions of the
bumblebee problem
Swithout 224.2 328.4 386.9 452.0
Swith 150.9 190.8 225.0 245.9
+ Without Defined Functions
With Defined Functions
0510 15 20 25
Number of flowers
Figure 9.15 Comparison of average structural complexity, S, of solutions to the bumblebee
problem with 1 0, 1.5 , 20 , and 25 flowers, with and without ADFs .
automaticallydefined functions. Thatis,ittakes eil:r average of only6.44points
in the program tree to handle each additional flower with automatically
defined functions. That is, as the size of the problem is scaled up, the size of
the solutions seenu to grow at less than half the rate with automatically defined
functions than without them.
We now consider the computational effort required for the bumblebee problem, with and without automatically defined functions.
Thble 9.9 consolidates the values of computational effort for 10, 15,20, and
25 flowers, with and without automatically defined functions for the bumblebee problem.
Figure 9.17 shows the computational effort for 10, 15,20, and 25 flowers,
both with and without automatically defined ftrnctions. As can be seen, the
10 15 20 25
291 The Bumblebee Problem
values of the computational effott, Ewithou,, without automatically defined
functions (116,000, Iffi,000,g28,000, *taO,ti,O00) grow very rapidly with problem size.
IrVhen we perform a linear regression on the progression of values of computational effort, Ewithout, without automatically defined functions (11"6,000,
L50,400,328,000, al.'rd 612,000), we find that the computational effort, Ewithout r
canbe stated in terms of the number of flowers, F, as
Ewithout - -27 5,600 + 33,720 F,
with a correlation of 0.95. It takes about 33,120 additional fitness evaluations
to handle each additional flower without automatically detined functions.
When we perform a linear regression on the norunonotonic progression of
values of E.r, obtained from the empirical data with automatically defined
functions (96,000,132,000,2&,000 and192,000), we find that the computational
effort, Ewith, can be stated in terms of the number of flowert F, as
Ewith =24,000 + 8,400F,
with a correlation of 0.74. This correlationof 0.74is much smaller than we
have seen in previous comparisons because of the nonmonotonicity of
this particular set of observed data, the sparsity of data for doing the regression, the possible inappropriateness of the model, or a combination
Thble 9.9 Comparison of computational effort for 10, 15,20, and 25 flowers for the
bumblebee problem.
10 15 20 25
Ewithout LL6,000
Ewith 96,000
328,000 612,000
264,000 192,000
160,000
1.32,000
700,000
+' Without Defined Functions
With Defined Functions
E
350,000
0510 15 20 25
Number of flowers
Figure 9.17 Comparison of computational effort for L0, L5,20, and 25 flowers, with and withoutADFs.
292 Chapter 9
of these factors. Nonetheless, it takes only about 8,400 additional fitness
evaluations to handle each additional flower with automatically defined
functions. The slope of 8,400 with automatically defined functions is only
about 25"/o of the slope (33,120) without automatically defined functions.
That is, as the size of the problem is scaled up, the computational effort
grows at less than a quarter of the rate with automatically defined functions than without them.
9.I4 WALLCLOCK TIME FOR THE BUMBLEBEE PROBLEM
Table 9.10 shows the wallclock ratio, Ry,, for the bumblebee problem with 10,
15,20,and25 flowers.As canbe seen, the fourwallclockratios are eachgreater
than 1, indicating that the runs with automatically defined functions require
less wallclock time than the runs without automatically defined functions.
Thus, automatically defined functions are beneficial both in terms of computational effort and wallclock time for this problem.
There are various advantages, disadvantages, ffid conunon attributes to
measuring computational burden by means of Erurou,, E.nh, ffid Ru as opposed to measuring it by means of. W*;6ort, Wwith, and Rr.
The major advantages of the computational effort, E, as a measure of computational burden are that it provides a hardware-independent, software'
independent, and algorithm-independent way of comparing the performance
of adaptive algorithms. These advantages derive from the fact that E treats all
fibress evaluations equally. These advantages go hand in hand with the major
disadvantage of E: it ignores differences in elapsed wallclock time.
The major advantage of wallclock time as a measure of computational burden is that it speaks directly to the management of computer resources. It
directly reflects the different sizes, shapes, and contents of the program trees
evolved by genetic programming. Wallclock time has the disadvantage of
being algorithm-dependent, hardware-dependent, and software-dependent.
Both groups of measures have the desirable attribute of explicitly incorporating unsuccessful runs in the measurement of the performance of the
algorithm.
Both groups of measures share several undesirable attributes. Th"y are timeconsuming to compute; they are retrospective in nature; they are sometimes
Thble 9.L0 Wallclock ratios, Rq,, for the bumblebee problem.
Problem size Wallclock ratio Ry7
L0
15
20
25
1.008
1.522
1,.820
3.576
The Bumblebee Problem
294
very sensitive to small variations in the observed data (especially when the
probability of success is high); and they are sensitive to the choice of
G (especially when the probability of success is low).
The algorithm-independence of E arises from the fact that fihress evaluations lie at the heart of every adaptive algorithm. Fitness evaluations are
common to all adaptive algorithms (probabilistic and deterministic). Every adaptive algorithm starts with at least one point in the search space of
the problem. For example, simple hillclimbing algorithms and simulated
annealing typically start with a single point in the multidimensional search
sPace of the problem; neural net paradigms typically start with a single
vector in the search space of weight vectors; genetic methods typically
start with a population of chromosome strings or other structures from
the search space of the problem. Adaptive algorithms then iteratively evaluate the fitness of the current point(s) and use that information to create
new point(s). Not every new point created by an adaptive algorithm is
necessarily better (except in hill climbing algorithms). Nonetheless, the
goal of an adaptive algorithm is to travel through the search space of the
problem so as eventually to find better points in the search space. Focusing on fitness evaluations is usually informative because fitness evaluations are almost always computationally burdensome for interesting
problems. Moreover, fibress evaluations come from the nature of the problem, not the nature of the particular adaptive algorithm being used. The
algorithm-independence of fitness evaluations is desirable because it offers the possibility of comparing different adaptive algorithms.
The hardware-independence and software-independence of E arises from
the fact that the computation of E is not specific to any particular piece of
computing machinery or any particular progranuning language or operating
system. We have used LISP machines (whose machine code is especially designed for LISP) for genetic prograrnming whereas most other users have
used general puryose workstations. The use of a LISP machine undoubtedly
facilitates execution of genetic prografirming when implemented in a manner based on the representation of a program as a parse tree. (It certainly also
facilitates development of software for genetic programming). Ameasure such
as E permits direct comparison of our results with the results obtained by
others using different platforms. (Our own runs of genetic programming have
been made using four different configurations of LISP machines, so comparing wallclock time would be difficult even among our own runs.) There is no
need to pay any attention to differences between the particular hardware or
operating systems of the particular platfonrrs when a measure such as E is
used.
The most important disadvantage of E is that it treats all fitness evaluations and all individuals equally. The computational burden associated
with the evaluation of fitness of different points in the search space can be
different for several reasons. The computational burden, of course, depends on the specific content of the programs (e.9., a call to the cosine
Chapter 9
function is more time-consuming than a simple addition). In the case of
genetic programming, the computational burden also depends, in part,
on the size of the programs. It is comforting that the solutions produced
by genetic programming with automatically defined functions tend to be
smaller than the solutions produced without automatically defined functions. However, the solutions are unusual points in the search space and
the computational burden of a run of genetic programming depends on
the cumulative size, over all individuals in the population and over all
generations, not on the size of the one solution (or the handful of solutions) that ultimately emerge on the final successful generation of the run.
More important, E does not reflect the substantial extra cost associated
with handling automatically defined functions. Another reason why counting the number of fitness evaluations might be misleadirg is that wallclock
time reflects the actual time required to evaluate the fitness of the particular points in the search space of the problem that are actually visited by
the adaptive algorithm. The trajectory of one adaptive algorithm may conceivably create a disproportionately large number of intermediate points
whose fitness evaluations are extraordinarily burdensome whereas the trajectory of another algorithm may create candidates that may be evaluated
more easily. For example, some trajectories for some problems may ccntain many infeasible points that might cause the simulation involved in
the fitness evaluation of the problem to time out.
Wallclocktime is, of course, an appropriate measure of computationalburden for any adaptive algorithm. For a givenproblem, different adaptive algorithms trace different trajectories through the search space of the problem.
These trajectories may differ as to both their generationalength (i.e., number
of iterations or cycles of the algorithms required to yield a solution) and the
computationalburden associated with the particular points along the trajectory actually traced by a particular algorithm through the search space. hr
many problems, certain points in the search may take more time to process
than others. For example, in many control problems, a trajectory containing
many Poor points may require more processing time than other trajectories.
For certain algorithms, the step of creating new points is computationally
intensive. For example, back propagation requires a large number of calculations to convert the current single point in the search space into the next. Lr
contrast, the computational burden associated with the step of creating a new
point in the seaich space is extremely low with the conventional genetic algorithm operating on fixed-length strings (because crossover, mutation, md
reproduction of strings are extremely fast and simple operations). This burden is somewhat greater for genetic programming than the conventional genetic algorithm (but still low in comparison to the burden of the fitness
evaluations for a nontrivial problem). Of course, genetic methods use a population of points, whereas most adaptive algorithms operate on just a single
point at a time. For certain algorithms, there are certain fixed front-end or
back-end costs.
295 The Bumblebee Problem
Studyurg wallclock time is especially pertinent in connection with genetic
Programming because genetic programming differs from most other
adaptive algorithms in that the individual points along the trajectory traced
throughProgram space have different sizes and shapes. If all other things are
equal, a larger program will usually take more wallclock time to evaluate
than a smaller program. Lr contrast, for most adaptive algorithms, the structure undergoing adaptation is fixed throughout the run. For example, in the
conventional genetic algorithm, the structure undergoing adaptation is typically a fixed-length character string. In neural networks being hained using
back-propagatiory the structure undergoing adaptation is a fixed-size vector
containing the weights for the fixed number of connections in the neural
network.
Ewithout, E*ith, RE, Wwithoutr Wwnt, and Rtall share the advantage of
explicitly recognizing the reality that every run of the algorithm does not
necessarily yield a solution (or a satisfactory result). Many adaptive algorithms (e.9., simulated annealing,backpropagation) have explicitprobabilistic steps that determine whether or when a particular run actually yields a
solution. Other seemingly nonprobabilistic leaming algorithms are so dependent on artifacts (e.9., the order of presentation of data) that their overall performance in solving problems must be regarded as effectively probabilistic.
M*y reports of the performance of these algorithms ignore or underplay
this probabilistic naturebyneglecting or dismissing the failed runs in discussions of the performance of the algorithm. The effectively probabilistic nature
of manyadaptive algorithms is also oftenmaskedbythepresentation of problems that are so simple that the algorithms always seem to work.
One disadvantage of both groups of measures is that they are particularly sensitive to the vagaries of empirical data when the probability of
success approaches 100%. We use E to illustrate this point. \Atrhen P(M,i)
is between 0.78 and 0.89, only three independent runs, R(z), are required
P(M,l); when P(M,i) is between 0.90 and 0.98, only two independent
runs are required; and when P(M,;) is 0.99 or more, only one run is required. The effect of an observed value of 0.99 versus 0.98 for P(M,i) is
that E decreases by a factor of 2. We sometimes see this abrupt drop in E
when a problem that genetic programming seems to solve on every run
encounters its first unsuccessful run (thus changing P(M,i) from 100%to
some value below 0.99).
Both groups of measures are also particularly sensitive when P(M,i) is
small. When P(M,l) is small., R(M,i,z) is large. In that regime, a small
change tn P(M,l) causes a large change tn R(M,i,z).
In addition, both groups of measures depend on a reasonable choice of G.
If G is too small for a given problem, the best generation l* may appear to be
generation G (i.e., the last generation of the run). When this happens (especially when P(M,l) is small and R(M,i,z) is large), there is a question as to
whether the true global minimum for I(M,i,z) or W(M,i,7) has been
achieved (i.e., whether the apparentbest generation is merely an artifact of an
insufficient$ large choice for G).
296 Chapter 9
Both groups of measures share the disadvantage that they are ultimately
based on a count of the number of occurrences of an all-or-nothing event (i.e.,
getting a result that satisfies the success predicate of the problem). The fact
that no credit is given for progress toward a solution may be a very reasonable and realistic characteristic for a performelnce measure for an adaptive
algorithm whose goal is to solve a problem. However, this fact makes such a
performance measure very time-consuming to compute because it requires
multiple successful runs. If P(M,i) is low a large number of unsuccessful
runs will continue through generation G for each successful run. Because the
method is time-consuming, less data may be available than we might like for
computing this measure (for a given amount of available computational
resources).
A minor disadvantage of E is that it is not a complete measure of the computational burden. The computational burden of an adaptive algorithm depends on the effort required to initialize the algorithm, the number of new
points created during a run, the effort required to createnew points, thenumber of fitness evaluations made during the run, and the effort required to do
those fitress evaluations. E does not specifically measure the computational
burden of initialization. Howeveq, the computational burden required to inittalizean adaptive algorithm is usuallyvery small in relation to its other steps.
In addition, adaptive algorithms vary in the way that the overall computational burden is divided between the step of creating new points and the step
of evaluating the fibress of the created points. For example, the number of
new points created during a run varies significant$ from algorithm to algorithm. Some adaptive algorithms (e.g., simulated annealing and many neural
networkparadigms) create only one new point for each generation (cycle) of
the algorithm. On the other hand, hillclimbing algorithms typically create
multiple tentative new points on each generatiory evaluate them all, and then
select thebest altemative as the new point in the search space. Parallel search
algorithms and genetic algorithms create a large number of new points at
each generation. In addition, the computationalburden required to create the
new points varies significantly from algorithm to algorithm. The creation of a
new point is relativelyburdensome for some adaptive algorithms (e.9., back
propagation) but it is relatively easy and simple with others (e.9., genetic
algorithms). Howeve4, the importance of these differences should not be exaggerated because the number of new points that are created by an adaptive
algorithm on each generation is usually equal to (or at least proportional to)
the number of fitness evaluations @ecause a fibress evaluation is associated
with each new point). Thus, except for the initialization step, the computational burden associated with an adaptive algorithm ends up depending more
or less directly on the number of fibress evaluations.
There are some uncertainties involved in measuring wallclock time on a
LISP machine. Many of the key activities of the LISP machine on which we
did the work reported in this book involve the creation of elaborate linked
structures that represent LISP S-expressions, which causes CoNSing. Our
297 The Bumblebee Problem
implementation of genetic programming relies heavily on dynamic memory
allocation and memory reclamation. In LISP machines, memory cells that are
no longer in use are reclaimed by means of periodic garbage collection. The
time required for a run varies in part d.ue to memory fragmentation that
inevitably occurs as the amount of time since the machine was booted increases. In practice, many runs are necessarily made before the machine is
rebooted. A consequence of these activities that are peculiar to LISP machines
and our implementation of genetic programming is that the wallclock times
for a given set of identical runs of genetic programming may vary substantially
and unpredictably depending on several interrelated factors (e.g, the amount
of time since rebooting, etc.) even if they all perform exact$ the same computation. These uncertainties are themselves significantly related to the cumulative structural complexity of the programs in the population as a whole (the
biomass).
One could circumvent these difficulties by inserti.g u counter inside the
interpreter function in the kemel of the code for genetic programming. Since
different operations take different amounts of time, the increments to this
counter would be a function of the particular operation being performed.
This approach would result in a reliable and repeatable machine-specific
measure of wallclock time. This measure could evenbe considered machineindependent if there were agreement on a particular table of times for each of
the primitive operations. Howeve4, maintaining this count would require that
itbe retrieved from memory, incremented, and stored for each terminal that
is actually evaluated and each primitive function that is actually executed.
This counter would slow down the run since retrieving the counteq, adding
the operation-specified increment, and storing the result must all be performed
in the innermost loop.
Another importantfactor inthe amount of wallclock time required to measure fitness is the number of functions and terminals that are actually evaluated in the entire population of programs. The wallclock time is not directly
proportional to thenumber of functions and terminals thatare actuallyevaluated in a given program because different amounts of time may be required
to evaluate the different functions and terminals. The number of functions
and terminals that are actually evaluated in a given program is not always
the same as the number of functions and terminals in the program (i.e., its
structural complexity) because only part of a program may actually be
executed and because multiple calls to automatically defined functions result
in repeated references to all of the points represented in the bodies of those
automatically defined functions. Partial execution may occur because of explicit conditional branching operations in the program, because many functions (e.9., the non-stuict AND and On functions) are defined so as to short-circuit
the evaluation of some of their arguments when the outcome becomes established, and because programs are often terminated by the time-out limits
imposed in fitness calculations involving simulations. The impact of branching operations and non-strict operators cannot necessarily be estimated by
relying on averages since we frequently see the formation of large intron-like
298 Chapter 9
(i.e., ignored) structures in the Program trees produced by genetic Programming (section 25.13 of GeneticProgrammrrzg). More importanf when automatically defined functions are being used, the number of functions and terminals
that are actually evaluated in a given pro$am depends on the extent to whidr
the result-producing branch calls other branches and the extent to which
the function-defining branches hierarchically invoke other functiondefining branches.
299 The Bumblebee Problem
10 The lncreasing Benefits of ADFs as
Problems are Scaled Up
Chapters 6,8 au-.td 9 focused on problems for which a progression of several
scaled-up versions were considered.
Thble 6.10 showed thatboth the efficienry ratio, Ra, and the structural complexity ratio, R5r €rr€ greater than 1 for the even-4-,5-, and 6-pafity problems.
Table 8.9 showed the sarne for the lawnmower problem with lawn sizes of 32,
48,64,80, and 96. Table 9.7 showed the same for the bumblebee problem with
10,75,20, and25 flowers. Accordirgly, main points 3 and 4 stated that automatically defined functions reduce the computational effortrequired to solve
these problems and usually improve the parsimony of the solutions produced
by genetic programming.
The regression analyses (both linear and exponential) concerning the
parity problem (section 6.15), the lawnmower problem (section 8.1,5), and
the bumblebee problem (section 9.1,3) indicated that the average structural complexity increases as a function of problem size at a lower rate
with automatically defined functions than without them (main point 5)
and that the computational effort increases as a function of problem size
at a lower rate with automatically defined functions than without them
(main point 6).
The focus in this chapter changes from the values of S and E to the values
of the two ratios, Rs and Ru. Specifically this chapter explores the question
of how the efficiency ratio, R6, €r:rd the structural complexity ratio, R5, in
tables 6.L0,8.1L, and 9.L0 change as a function of problem size.
10.1 THE BENEFITS OF ADFs AS A FUNCTION OF PROBLEM SIZE
We first examine the efficiency ratios contained in the three tables.
Figure L0.1 plots the efficiency ratios, R6,, from table 6.L0 as a function of
the arity of the parityproblem (excludirg the ratio for the 6-pang problem in
table 6.10 of 52.2based on the rough estimate of section 6.6).
Figure L0.2 plots the efficiency ratios, R6, from table 8.9 as a function of the
size of the lawn in the lawnmower problem.
Figure L0.3 plots the efficiency ratios, Ra, from table 9.7 as a function of the
number of flowers in the bumblebee problem.
Ir1
F(
6
+a
L
>>9
I
q)
(J
r1F!
rd
n(
t
L
>)
I
0) ()
rlrt
tt!
-
Arity
Figure 10.1 Graph of efficiency-ratro, Rp, for the even-parity problems.
Lawn size
Figure 10.2 Graph of efficiency-ratro, Rg, for the lawnmower problem with lawn sizes of 32,
48,64,80, and 96.
4
10 15 20 25
Number of flowers
Figure 10.3 Graph of efficiency-raho, Rg,for the bumblebee problem with LO 15,?0, and 25
flowers.
tr
G
J
Cg
L
>>
I
q)
CJ
l!r!
tF
T
302 Chapter 10
Arity
Figure 10.4 Graph of structural-complexity-ratio, R5, for the even-parity problems.
u)
F(
llc€
L
x
C)
CJ
Lr t
I
I
t
h
+a
a
2.5
rA
q
G
I
GI
L
I
X
o
e
-
I
-
lr'-
{.r
C, )
t
I
a
5.0
32 48
Lawn size
Figure 10.5 Graph of structural-complexity-ratio, R5 ,
for the lawnmower problem with lawn
sizes of 32,48,64,80, and 96.
u7
E
6
U
6g
L
€
X
C)
F
I
cg
L
= l
€ l
9 l
f, orT
A10 15 20
Number of flowers
Figure L0.6 Graph of structural-complexity-ratio, Rg, for the bumblebee problem with 10,15,
20, and 25 flowers.
303 The hncreasing Benefits of ADFs as Problems are Scaled Up
The efficiency ratios, Rs, increase strictly monotonically as the problem
size increases. That is, the benefit conferred by automatically defined functions as to computational effort increases as problems are scaled up. This point
is closely related to, but slightly different from, the subject of rnain points 3
and 6.
We now reexamine the strucfural complexity ratios contained in the same
three tables.
Figure L0.4 plots the structural complexity ratios, R5, from table 5.10 as a
function of the arity of the parity problem (excluding the ratio for the even-6-
parity problem of L.77 based on the rough estimate of section 6.6).
Figure 10.5 plots the structural complexity ratios, R5, from table 8.9 as a
function of the size of the lawn in the lavsnmower problem.
Figure 10.6 plots the structural complexity ratios, R5, from table 9.7 as a
function of the number of flowers in the bumblebee problem.
The strucfural complexity l:lfros, fts, increase monotonically (i.e., do not
decrease) as the problem size increases. That is, benefits as to parsimony
conferred by automatically defined functions increases as problems are scaled.
up. This point is closely related to the subject of main points 4 and 5.
This evidence supports main point Z of this book:
Main point 7: For the three problems herein for which a progression of
several scaled-up versions is studied, the benefits in terms of computational
effort and average structural complexity conferred by automatically defined
functions increase as the problem size is scaled up.
This main point is important because it suggests that the advantages of
exploiting modularities by means of hierarchies becomes greater as problems
become larger and more realistically sized.
10.2 WALLCTOCK TIME
The evidence from the two problems for which wallclock time is computed
for a progression of scaled-up versions (i.e., the lawnmower problem in section
8.16 and the bumblebee problem in section 9.14) also supports the conclusion
that the advantages con-ferred by automatically defined functions increase as
the problem size is scaled up. For both problems, the wallclock ratios, Rr,
increase monotonically as the problem size increases.
Figure L0.7 plots the wallclock ratios, R', as a function of the lawn size in
the lawnmower problem.
Figure 10.8 plots the wallclock ratios, Rw, as a function of the number of
flowers in the bumblebee problem.
304 Chapter 10
r,2w
1
F<
tr
,v 600
I
I
:
€
cg
li
I
I
32 48
P"oule6rl sire
80 96
Figure 1.0.7 Graph of wallclock-ratio, R5, for the lawnmower problem with lawn sizes of 32,
48,64,80, and 96.
Problem size
Figure L0.8 Graph of wallclock-ratio, Rq7, for the bumblebee problem with 10, 15, 20, and25
flowers.
25
305 The hcreasing Benefits of ADFs as Problems are Scaled Up
IT Finding an Impulse Response Function
In the foregoing chapters, information was explicitly transmitted to the
genetically evolved reusable subprograms by explicit arguments or was
implicitly transmitted to the subprograms by meeil:rs of the state of the system. InJormation canbe implicit$ transmitted in another way: by global variables. This chapter presents a problem in which information is transmitted to
the evolved subprograms in two ways: by a global variable and by * explicit
argument.
The problem in this chapter is to find the impulse response function for
a linear time-invariant system. Martin A. Keane conceived the impulse
response problem (Keane, Koza, and Rice 1993) and we subsequently
applied automatically defined functions to this problem (Koza, Keane, and
Rice 1993).
The fact that the automatically defined functions in this problem are
real-valued functions of a single variable permits the automatically
defined functions to be visualized graphically. This, in turn, enables us to
visualize, in some instances, the often-illusive connection befween program structure and program performance in the problem domain. It also
enables us to visualize the effect of crossover on program performance.
Section tl.7 traces the genealogical audit trail of illustrative offspring produced by crossover in both the function-defining branch and the resultproducing branch.
1I.I THE PROBLEM
For many problems in control engineering, it is desirable to find a function,
such as the impulse response function or transfer functiory for a system for
which one does not have an analytical model.
hr this chapter genetic progranuning is used to find a good approximation,
in symbolic form, to the impulse response function for a linear time-invariant
system using only the observed discrete-time response of the system to a particular known forcing function.
The reader unfamiliar with control engineering should focus on the fact
that we are searching the space of possible functions for a real-valued
.I
Figure L1.L A linear time-irnrariant system.
function that satisfies certain requirements, rather than on the engineering
interpretation of the impulse response function.
Figure 11.1 shows a linear time-invariant system (i.e., a plant) that sums the
outputs of three major components. Each component consists of a pure timedelay element, a lag circuit containing a resistor and a capacitor, and a gain
element. The tirst component of this system, for example, has a time delay of
6, a gain of +3, and a time constant, RC, of 5. For computational simplicity
and without loss of generality, we use the discrete-time version of this system
in this chapter.
In the problem of system identification, one is given the observed
response of the unknown system to a particular known input. Figure 1L.2
shows a particular square input, i(r), that rises from an amplitude of 0 to 1
at time 3 and falls back to an amplitude of 0 at time 23.It also shows the
response, o(t), of the system when this square input is used as a forcing
function.
The output of a linear continuous-time time-invariant system is given by
the continuous-time convolution of the input, i(t), andthe impulse response
tunctiory H(t). That is,
The output of a hnear discrete-time time-invariant system is given by the
discrete-time convolution
*
o(t)- Li(t-t)H(t).
o(t)-f:y-c)H(r)dr.
308 Chapter 1L
q)
-' v
FI -
*l
. !
A
A -
-
-
F
I
Plant Response
+ Input
020 Time
Figure L1.2 Plant response when a square input is the forcing function.
20 Time 40
Figure 11.3 Impulse response function, H(f).
The impulse response function, H(t), for the system above is known to be
q)
€!1
-
-
. l
-
eA
F
-
60
if t < 6
+
otherwise
0 if t<15
r r rf-15 -8(1-;)
\ tL t otherwise
I2
if t <25
otherwise
Figure 11.3 shows the impulse response function, H(t), for the system.
We now show how an approximation to this impulse resPonse can be discovered by genetic progranuning using just the observed discrete-time timedomain response to the square forcing function. The discrete-time version of
the square input and the system's discrete-time time-domain response to the
square input shown in figure 1L.2 arcthe grvens in this problem; the goal is to
find a good approximation to the impulse response of figure 1"L.3.
rr.2 PREPARATORY STEPS WITHOUT ADFs
The candidate impulse response functions are compositions of the primitive functions and terminals of the problem. The single independent variable in the impulse response function is the time, t. In addition, the impulse
309 Finding an Impulse Response Function
response function may contain numerical constants. Thus, the terminal
set, t, for this problem consists of
T= {T,frbigger-reals},
where the floating-point random constant, frbigg".-.eals, ranges betweerrL
-10.000 and +l-0.000 (with a granularity of 0.001).
For this problem, knowledge of control engineering suggests thatthe function set might consist of some kind of decision-making operato4, the four
arithmetic operations, and the exponential function. Thus, the functionset, f,
for this problem is
f = {+, -, *, %, EXPP, IFLTE}
with an argurnent map of
{2,2,2,2,I,4}.
The protected division function % (section4..Z) protects against the possibility of division by zero. Howeveq, the potential of an overflow or underflow
arising from the creation of extremely large or small floating point values
always exists whenever arithmetic operations are performed on a computer.
The presence of the exponential function in the function set of this particular
problem guarantees the creation of extreme values. Extreme values may be
created by the exponential function alone, by one of the four ordinary arithmetic functions operating on values retumed by the exponential functiory or
evenby the arithmetic functions alone. Thus, it is necessary to protect all four
arithmetic functions so that if the absolute value of the result is greater than
some very large value or less than some very small value, then some nominal
value (with the appropriate sign) is instead retumed. This protection can be
provided either by writing magnitude-protected versions of all four arithmetic operations or by trapping the overflow or underflow errors in a manner appropriate to the computer being used. Ordinarily, the one-argument
protected exponential function EXPP returns the numerical result obtained
by raising e to the power indicated by its one argument and the two-argument arithmetic operations of +, -, *, arrd % refum the numerical result obtained by performing these operations. Howeve4, whenever the absolute value
of the result of evaluating uny of these five ftrnctions exceeds the limits of the
machine (i.e., about 10-38 or L038 when floating-point numbers are used for
our Texas lnstruments Explorer II+ computer), then some nominal value (1010
or L0-10, respectively, with the appropriate sign) is instead retumed.
The four-argument conditional branching function rFLTE ('If Less Than
or Equal") evaluates and retums its third argument if its first argument is less
than or equal to its second argument and otherwise evaluates and returns its
fourthargument.Forexample, (TFLTE 2-0 3.5 A B) evaluatestothe
value of a.
Each computer program in the population is a composition of primitive
functions from the function set, f, andterminals from the terminalset, t
310 Chapter LL
N.o[
, 12
f (G)= Ilor,,,
i=l L r=-@ I
Of course, if we had some knowledge about the specific plant being analyzedthat suggested the utility of certain other functions (e.9., sine), we could
also have included those functions h, f, as well.
For this problem, the fitness of an individual impulse resPonse function in
the population is measured in terms of the difference between the known
observed discrete-time time-domain response of the system to a particular
forcing function and the response computed by convolving the individual
impulse response function and the forcing function. The smaller the difference, the better. The exact impulse response of the system would yield a difference of. zerc.
Specifically, each individual in the population is tested against a simulated environment consisting of Nyc = 60 fitness cases, each representing
the output, o(t), of the given system for various times between 0 and 59
when the square input, i(r), is used as the forcing function for the system.
The fitness of any given impulse resPonse function, G(t), in the population is the sum, over the 60 fitness cases, of the squares of the differences
between the observed response, o(t), of the system to the forcing function,
i(t), (i.e., the square input) and the response computed by convolving the
forcing function, i(r), and the given genetically evolved impulse response/
G(D. That is, the fitness is
Our choice of 4,000 as the population size and our choice of 5L as the maximurn number of generations to be run reflect an estimate on our part of the
likely complexity of this problem and the limitations of available computer
time and memory.
Table Ll.L summafizes the key features of the impulse-response problem
without automaticallv defined functions.
1L.3 RESUTTS OF ONE RUN WITHOUT ADFs
A review of one particular run will serve to illustrate how genetic
programming progressively approximates the desired impulse response
function.
One would not expect any individual from the randomly generated initial
population to be very good. br generation 0, the fitness of the worst impulse
response functionjn the population is very poor; its fitness is the enormous
value of 4.7 x 10" . This worst-of-generation individual consisted of seven
points and is
(- T (* (EXPP r) (EXPP r) )),
which is equivalent to
t-ezt.
31L Finding an Impulse Response Function
Thble 1L.1 Thbleau withoutADFs for the impulse-response problem.
Objective: Find a program that approximates the impulse
response function of a three-component time-invariant
linear system.
Terminal set
without ADFs:
The time T and the random constants Sbiggu._."utr.
Function set
without ADFs:
*, -, *, %, EXPP, and Ipltn.
Fihress cases: 60 consecutive integral values of time T between 0
and 59.
Raw fibress: The squares of the differences between the observed
response/ o(t), of the system to the forcing function, i(f)
(i.e., the square i.p.tQ and the response computed by
convolving the forcing function, i(f), and the genetically
produced impulse response, G(f).
Standardized fibress: Same as raw fihress.
Hits: The number of fitness cases for which the response to
the square input of the genetically produced individual
comes within 0.5 of the plant response.
Wrapper: None.
Parameters: M=4,000.G=51.
Success predicate: A program has a value of fitness of 20.00 or less over
the 60 fitness cases.
The fitness of the worst 4A% of the population for generation 0 'r t610 (or
worse).
The median individual for generation 0 is, when simplified, equivalent to
-9.667 -2.407t
and has a fitness of 10,260,473.
The fitress of the best impulse response ftrnction of generation 0 is 93.7
(i.e., an average squared error of about 1.56 for each of the 60 fitress cases).
This program has seven points and is
(% (z -2.46 r) (+ r -9.636)),
which is equivalent to
*2.46
_ .
*2.46
t -9.636 t' -9.636t
Figure 11.4 compares the best-of-generation impulse response function from
generation 0 and the correct impulse response, H(t),for the system. As canbe
seen, there is little resemblance between this best of generation 0 and the correct impulse response. Indeed, the signs of the values retumed by the best of
generation 0 are incorrect for almost every value of time.
Chapter 1.1
arJ
-
)
+a
. J
a-
-
F - Correct Impulse Response
* Generation 0Impulse Response
20 Time 40
Figure 11.4 Comparison of the best of generation 0without ADFs (whose fitness is 93.7) with
the correct impulse resPonse function'
In successive generations, the fitness of the worst-of-generation individual
in the population, the median individual, and the best-of-generation program
all tend to progressively improve (i.e., drop). hr additiory the average fifiress
of the population as a whole tends to improve. The fifiress of the best-ofgeneration program drops to 81.88 for generation 3,76.09 for generation 5,
70.65 for generation 7, and 48.26 for generations 8 and 9. Of course/ the vast
majority of individual computer programs in the population are still very
Poor.
By generation L0, the fitness of the best-of-generation program improves to
40.02.This individual has 111 points and is shown below:
(TFLTE (EXPP (IFLTE T T T 2.482)) (EXPP (+ ft 9.39 T) (IFLTE (IFLTE
r 9.573 r -6.08s) (% r 0.2\700r) (EXPP -5.925) (% r r) ))) (EXPP ( *
(- (+ r -4.67s) (EXPP r) ) (IFLTE (% -5.631 r) (% -1 .675 -1.485) ( +
T2.623)(EXPPT))))(9o(EXPP(-TT))(-(+(*-1.15399-5.332)(%
(z (* (rFLrE -8.019 T 0.338 T) (% r 8.571)) (- (* (- 1.2L3 T) ( +
(EXPPT)(+7.6056.873)))(IFLTE(+TT)("-5'149T\(+rr)(-T
T) ))) (* T 6.193))) (TFLTE (% (EXPP r) (EXPP (* -3.817 r) )) (* r
6.L93) (- -8.022 7.743) (+ r -e.464) ))) ) .
Note the subexpression (- -B .022 1 .7 43 ) in the last line which evaluates to -15.765. The value -15.765 was evolved from the floating-point random constants 4.022 and7743 originally created in generation 0.
Figure L1,.5 compares the genetically evolved best-of-generation impulse
response ftrnction from generation 10 and the correct impulse resPonse. As
.u1b" seen, this individ.ual bears some resemblance to the correct impulse
response for the system.
As one proceeds from generation 20 to 30 and to 40, the fitness of the best
program in the population improves from 19.85 to 12.37 and to 6.97.
By generation 50, the best-of-generation program shown below }las 286
points and a fibress value of 5.22(i.e., a mean squared error of only about 0.87
for each of the 60 fibress cases):
(rFLrE (EXPP (TFLTE T r T 2.482)) (EXPP (- -8.022 1.7431) (EXPP ( *
(- (eo (eo (* (TFLTE -8.019 T 0.338 T) (- -5.392 T) ) T) (Z (* (EXPP
60
Finding an Impulse Response Function
q)
-
v
II
-
*l . i
-
A .
-
FI
F
E
- Correct Impulse Response
* Generation l0Impulse Response
20 Time 40 60
Figure 1.1.5 Comparison of the best of generation 10 without ADFs (whose fihress is 40.02)
with the correct impulse response function.
- Correct Impulse Response
* Generation 50Impulse Response
20 Time 40
Figure 11.6 Comparison of the best of generation 50 (whose fitness is 5.22) withoutADFs with
the correct impulse rcsponse function.
(EXpp -5.221)) (rFLrE (* T T) (rFLrE (% _s.631 r) (% _l-.675 _L.485)
(+ r 2.623) (EXPP r) ) (- 9.951 -4.11s) (% -8.978 r) )) (- (rFLrE
(EXpp r) r 1-.1 (sxpp 2.73L)) (% (,k (* -3.817 T) (% T 8.571)) (rFLrE
-8.019 T 0.338 T) )))) (EXpp (rFLrE r 9.573 r -6.085))) (rFLrE (% -
s.631 r) (% -t.67s -L.4Bs) (+ r 2 .623) (EXpp r) ))) (% (EXpp (rFLrE
-8.019 r 0.338 r) ) (- (+ (* -1.1s399 -s.332) (% (Z (* (rFLrE -8.019
r 0.338 r) 8.571) (% r B.s7t-) ) (- (+ (* -1.i-s399 -5.332) (z (z ( *
(rFLrE -8.019 r 0.338 r) (z r 8.571)) T) (% (* (EXpp (EXpp -5.22L))
(TF.LTTf /* \rrr-rr-.cr |F T'\ /TE.T..r,tr {EI.DD /_ \. 1 1/ r-.!a! \!/\r' i -8.022 1.743],) (% -f .615 -I.485) ( +
T 2.623) (EXpp r) ) (- 9.957 -4.11s) (% _8.978 r) )) (_ (rFLrE (EXpp
r) r 1.1 (EXpp 2.73r)) (Z (- -8.022 7.743) (EXpp 2.f3L) )))))
(TFLTE (? (EXpp r) (- 9.957 -4.115)) (* T 6.193) (TFLTE (% -5.631
r) (% -1 .675 -r.48s) (+ T 2.623) (EXpp r) ) (+ r -9.464))))) (rFLrE
(? (EXpp
")
(- -8.022 7.743) ) (* r 6.1_93) (rrlrE (% (EXpp r) (EXpp
(* -3.81-7 r) )) (- (IFLTE (+ T -4.679) (- -S.392 T) l-.1 (EXpp
2.73I)) (Z (+ r r) (* -1.1s399 -s.332))) (- -8.022 (% -8.022 (- ( *
(* (* (- 1.21-3 r) 0.21700]) (U -5.631 r) ) (+ (EXpp r) (- (rFLrE ( +
T -4.679) (- -S.392 T) t-.1 (EXpp 2.73r)) (EXpp (% r 0.2r700L) ))))
(IFLTE (+ r r) T (* t 6.L93) (- r r) )))) (+ r -9.4641) (+ T -
e.464))))).
60
31.4 Chapter 11
q)
Et
-
*)
-
A
x
;
F
-
q)
-
v
E
J€.-
-
A
-
/
F
-
c)
E-
?
I
a -
-
A .
-l/
F
-
Plant Response
-+ Generation 0
Plant Response
+ Generation 50
20 Time 60
Figure 11.7 Response of best-of-generation programs from generations 0, 10, and 50 to the
square input.
Figure LL.6 compares the genetically produced best-of-generation
impulse response function from generation 50 and the correct impulse
resPonse.
The above impulse response function is not by any means the exact
impulse response functiory H(t), for the system. Howeve4, this genetically
created impulse response function is very good (although it may not appear
so at first glance).
Since the fituress measure is actuallybased on the time-domain response of
the system to the square i^pnt, the perforrnelnce of the genetically produced
Finding an Impulse Response Function
Generation 10 Square Input Response
315
Time
q)
€-
-
I
-
A
-
F
-
Q)
-,
v
FI
IJ
-
A
-
F
F
I
q)
-
+)
-
ad
F
-
Plant Response
-.F Generation 0
Time
Plant Response
+ Generation 50
20 Time
Figure 11.8 Response of the best-of-generation programs from generations 0, 10, and 50 to the
ramp input.
Plant Response
+ Generation 10
Chapter LL
best-of-generation impulse response functions from generations 0, 1"0, and 50
can be better appreciated by examining the computed time domain responses
of the system to the square input for these individuals.
Figure 11.7 compares the plant response (which is the serne in all three
panels of this figure) to the square input and the response to the square input
using the best-of-generation impulse response functions from generations 0,
10, and 50. As can be seen in the first and second panels of this figure, the
best-of-generation programs from generations 0 and L0 do not perform very
well, although generation L0 is considerably better than generation 0. Howeveq as can be seen in the third panel of this figure, the perforrnance of the
best of generation 50 is close to the plant response (the total squared error
being only 5.22 over the 60 fitness cases)-
If we define a hit to be any fibress case (out of the 60) for which the timedomain response to the square input of the genetically produced individual
comes within 0.5 of the plant response, then the number of hits improves
from 17 for the best of generation 0, to29 for the best of generation 10, and to
54for the best of generation 50.
Control system performance is often characterized in terms of response to
certain forcing functions (input signals) such as ramPs and steps. Accordingly,
the performance of the genetically evolved impulse resPonse function can
be further demonstrated by considering four additional forcing functions: a
ramp input, a unit-step input, a shorter unit-square input, and a noise
signal.
Figure 1L.8 shows the plant response to a particular unit ramp input
(whose amplitude is 0 between times 0 and 3, whose amplitude linearly
ramps up from 0 to L between times 3 and 23, and whose amplitude is 1
between times 23 and 59). It also shows the response to this ramP tnput
using the best-of-generation programs from generations 0, 10, and 50. As
can be seen, the performance of the best of generation 50 is close to the
plant response for the ramp input (the total squared error being only 7.2
over the 60 fitness cases).
Figure 11.9 compares the plant response to a particular unit-step input
(where the amplitude of the input steps up from 0 to 1 at time 3) and the
response to the step input using the best-of-generation programs from generations 0, 10, and 50. The performance of the best of generation 50 is also
close to the plant response for the unit-step input (the total squared error
being only 12.9 over the 50 fitness cases).
Figure 11.10 compares the plant response to a particular short unit-square
input (whose amplitude steps up from 0 to 1 at time 15 and steps down at
time 23) and the response to this short unit-square input using the best-ofgeneration programs from generations 0, 10, and 50. As can be seen, the performance of the best of generation 50 is also close to the plant response for
this short unit-square input (the total squared error being only 17.1 over the
60 fitness cases).
Figure L1..II shows a noise signal which we will use as the forcing function
for our fourth and final test. The random values in the range [0,1] are
317 Finding an Impulse Response Function
Plant Response
+ Generation 0
Time
Time
Plant Response
+ Generation 50
q)
-
I
€
F
I
0 20 Time 40 60
Figure 11.9 Response of the best-of-generation programs from generations 0, L0 and 50 to the
unit-step input
Chapter 11
q)
-
FI
:.
x
(a
F
;
Plant Response
+ Generation 0
0 20 Time 40 60
Figure 11.10 Resporse of the best-of-generation programs from generations 0, 10 and 50 to the
short unit-square input.
c)
il
-
+) .-
A
-
F
I
Finding an Impulse Response Function
q)
-
v
-
)
rP
E 0.s -
fl
F
I
41
0.0
0
Figure 11.11 Noise signal.
20 Time 40
obtained for each time step by a separate call to the Park-Miller randomizer
(Park and Miller 1988).
Figure IL.IZ compares the plant response to this noise signal and the
resPonse to the noise signal using the best-of-generation programs from generations 0, "1.0, and 50. The performance of the best of generation 50 is also
close to the plant response for the noise signal (the total squared error being
only 18.3 over the 60 fibress cases).
" Genetic programming is well suited to control problems where the exact
solution is not knovrn and where engineers do not expect, as a practical matteq, to achieve the actual optimal solution. The solution to a problem produced by genetic programming is not just a numerical solution applicable to
a single specific combination of numerical input(s), but, instead, comes in the
form of a function in symbolic form (i.e., a computer program). As can be
seen, we have evolved an impulse response function which closely models
the output behavior of the unknown system when the system is presented
with a variety of inputs.
Note that we did not pre-specify the size and shape of the result. We did
not specify that the result obtained in generation 50 would have 285 points.
As we proceed from generation to generatiory the size and shape of the bestof-generation programs changes as a result of the selective pressure exerted
by Darwinian nafural selection and crossover.
1.],.4 RESUXTS OF SERIES OF RUNS WITHOUT ADFS
Over a series of 28 runs, the average structural complexity, S*itnon, of
the best-of-run programs from the L6 successful runs (out of 28 runs)
of the impulse-response problem is 285.9 points without automatically
defined functions.
Figure 11.13 presents the performance curves based on the 28 runs of the
impulse-response problem without automatically defined functions. The
cumulative probability of success, P(M ,i) , is 50"/" by generation 39 and 57%
by generation 50. The two numbers in the oval indicate that if this problem is
run through to generation 39, processing a total of Erurout = 1,120,000
320 Chapter 11
Plant Response
,} Generation 0
3
2
o)
E
v
= 1
5)
=
3o
- l
-2
q)
-
rt
*a
,rr
F
F
T
Generation 10 Response to Noise Signal
Time
0 20 Time 40 60
Figure 11.12 Response of the best-of-generation programs from generations 0, 10 and 50 to
the noise signal.
Plant Response
+ Generation 50
Finding an Impulse Response Function
rvL
(n
U) q)
I
I
t
a
bso
!
. I
-
A
-
L
A ,
F
Without Defined Functions
(50,57Vo)
6,000,000
o
c.r,llurio,, so
Figure 11.13 Performance curves for the impulse-response problem showing that
Ewithout = 1,120,000 withoutADFs.
individuals (i.e., 4,000 x 40 generations x 7 runs) is sufficient to yield a satisfactory result for this problem with 99% probability.
11.5 PREPARATORY STEPS WITH ADFs
hr applying genetic programming using automatically defined functions to
this problem, we wanted to have an automatically defined function with
multiple arguments. Howeveq, because of the many time steps and the convolutiory this problem is already very time-intensive. Therefore, we compromised and decided that each overall program in the population will have one
one-argurnent automatically defined function and that the independent variable of the problem, T, will be available to the automatically defined function
as a global variable.
The terminalset, To4y, for the automatically defined function ADFO is
tad.f = {T, ARGO, frbigger_reals}.
The function set, fs4f, for the function-defining branch is the same as
before, namely
fad.f = {*, -, *, U, EXPP, IFLTE}
with the same argument mopr namely
{2,2,2,2,1,41.
The terminal set, 'lrpb, for the body of the result-producing branch does not
contain the dummy variable ARGO and is simply
t pb = { T, 9tbigg"r-reals }.
12,000,000
q)
a
a
O
I
L
A .
Ft
q)
-
+a
a
-
-a
i
-
'i3
. I
. I
-
t
-
322 Chapter 11
Thble L1,.2 Thbleau withADFs for the impulse-response problem.
Objective: Find a program that approximates the impulse
response function of a three-component time-invariant
linear system.
Architecture of the
overall program
withADFs:
One result-producing branch and one one-argument
function-defining branch.
Parameters: Branch Vpittg.
Terminal set for the
result-producing
branch:
The time t and the random constants Sbiss".-r"ulr.
Function set for the
resultproducing
branch:
*, -, *, %, EXPP, IFLTE, and the one-argurnent defined
function ADF0.
Terminal set for the
function-defining
branch ADFo:
The time T, the dummyvariableARc0, and the
random constants frbigg".-*ulr.
Function set for the
function-defining
branch ADF0:
t, -, *, %, EXPP, and rFr,rn.
Howeve{, the function set, frpb,for the result-producingbranch contains the
automatically defined function ADFO, so that
frpb= {ADFO, *, -, *, Z, EXPP, IFLTE}
with an argument map of
{r,2,2,2,2, r,41.
Table 11".2 summanzes the key features of the impulse-response problem
with automaticallv detined functions.
11.6 RESUTTS OF ONE RUN WITH ADFs
In generation 0 of one run of this problem using automatically
defined functions, the worst impulse response function in the Population is
(progn (defun ADF0 (ARGO)
(values (EXPP (+ ARGO T) ))
(values (* T (ADF0 (+ T T) )))).
hr this program, ADF0 returns the exponential of its argument, ARG0, plus
T, so that when the result-producing branch calls its ADF 0 with the numerical
argument (+ T T),ADFOrefums e3t.
The result-producing branch then multiplies the retumed value by r so
that this individual program as a whole is equivalent to te3t .
323 Finding an Impulse Response Function
This sharply monotonically increasing function of t bears no resemblance
to the correct impulse function of the system. Its fibress has the enormous
value of L.6 xL077.
The median individual of the population is
(progn (defun ADF0 (ARGO)
(- ARGO -8.354))
(values (ADFO T) ) ) .
This program is equivalent to the following simple linear function of r:
t +8.354.
The fihress of this monotonically increasing function of t is 2,249,945.
The fihress of the best-of-generation impulse response function for generation 0 is L0L.03 (i.e., an average squared error of about 1.68 per fitness case).
This best-of-generation program has 35 points, has only one call to ADFO,
and is
(progn (defun ADFO (ARGO)
(values (* (+ (TFLTE T T ARGO ARGO) (* T ARG0))
(* (EXPP 4.L52) (EXPP 9.581) ) ) ) )
(values (? (IFLTE (% T 1-.364) (EXPP -2.LL3)
(ADF0 r) (% -2.421, T) )
(% (- r 6.6s3) (Z r r) )))).
When the defined function ADFO is called with the argument T, it simplifies to
926370(t + f ).
ADFO is called by the result-producing branch only when f=0, so the resultproducing branch is equivalent to
fl.0 if /=0
I
I
1-tnt
| -' ''' otherwise.
It - 6.653t
Figure Il.TLcompares the best-of-generation impulse response function from
generation 0 and the correct impulse response, H(f). As can be seery there is
little resemblance between this best of generation 0 and the correct impulse
response for the problem. hrdeed, the two rarely even have the same sigr.
. By generation 5, the fitness of the best-of-generation program improves
slightly to 98.42. This individual has 60 points and is shor,m below:
(progn (defun ADFO (ARG0)
(values (* (+ (IFLTE T T ARG0 ARGO) (- (IFLTE (EXpp
-4.936) (% T ARG0) (- t T) r) (EXPP (* T T) ))) ( *
(EXPP 4.1,52) (EXPP -3.399)))))
(values (% (IFLTE (% T (EXpp (ApF0 (+ (ApFO -5.269) (-
(% T r) (TFLTE 6.518 T 3.851 -0.087)))))) (EXpp _2.rr3)
(ApFO r) (z -2.421_ T)) (Z (- r 0.6s3) (% r r) )))).
324 Chapter L1
€)
rt
-l r!
.-
-
A
-a
?1
F
I
fz
nzzsr
I
12.12338(
20 Time 40 60
Figure 11.L4 Comparison of best of generation 0 with ADFs (whose fitress is 101.03) with the
correct impulse response function.
The result-producing branch of this individual calls its ADF0 three times.
tn one instance, ADFO is called with the expression shown above in boldface,
namely
(ADF0 -5 .7 69)
as its argument. Second, ADF0 is called with the underlined expression
(ADFO (+ (ADFO -5.759t (- (? T r) (IFLTE 6.518 T 3.851
-0.087) ) ) )
as its argument. The first call to ADFO is embedded inside this second call.
Third, ADFO is called with just T as its argument (via the underlined expression near the end of the result-producing branch).
The defined function ADF 0 can be simplified to
atg
arg
O-e
0+r
') if r >0.00718 - arg0
- e') otherwise
By generation 15, the fitness of the best-of-generation program improves to 43.47.
This individual has 36 points, calls its particular ADF0 twice, and is shown below:
(progn (defun ADF0 (ARG0)
(values (% (- 1.732 ARG0)
( * (TFLTE -5 .295 -3
. 3 54 T 1_ . 567 )
(EXPP (* r -5.788))))))
(values (Z (IFLTE (% r 1.364) (Z (- r 0.653) (Z r r) )
(ADFO r) (Z -2.42L (ADF0 r) ) )
(? r 1.364)) )).
The result-producing branch of this program is equivalent to
( -t.ozzz
| --"--- if t <25
I adf(t)t
I
| 1.364 adf (t)
l# otherwise.
l.r
Finding an Impulse Response Function
.- Correct Impulse Response
+ Generation 0lmpulse Response
325
Since the defined fturction ADF 0 is called with the argument r in both instances,
it simplifies to
7.732 - t
w
thereby making the result-producing branch equivalent to
*3.0222e-s.788t
If t <25
7.732 - t
if t>I5
if r=0
otherwise.
The result-producingbranch is equivalent to
7.732-t
,jffi#l
otherwise.
Figure 11 . t5 compares the genetically produced best-of-generation impulse
resPonse function from generation 15 and the correct impulse response. As
can be seen, this individual has a small negative hump approximately where
the correct impulse response has a large negative hump, but otherwise bears
little resemblance to the correct impulse response function.
Figure 11.1,6 shows that, as we proceed from generation to generation, the
standardized fihress of thebest-of-generationprogram tends to improve progressively.
By generation 50, the best-of-generation program shown below has 8L
points, calls its ADFO twice, and has a fibress value of 11.38 (i.e., an average
squared error of only about 0.19 per fibress case) :
(progn (defun ADF0 (ARGO)
(values (Z (- 6.511 T) (* (eo (Z (Z (+ T ARG0) (- 5.I4I
-3.671) ) (- 6.511 0 .42r) ) (* (TFLTE -5.295 -3.354 T
L.567 ) (EXpp (* T -s.788)))) (EXpp (* r _5.788))))))
(values (% (IFLTE (% (- t 0.653) (? r (- r G.653))) (% r
L.364) (rFLrE (% r 1_.364) (e" (- T 6.653) (% r r) \ L.364
(z -2.421 (ADFO r) )) (Z -2.42L (ADFO r) )) (A (- r
6.6s3) (? r (% (% r r.364) (% r 1z r l_.364) ))))))).
Since both calls to the defined function ADFO are with the argument T, ADFO
simplifies to
2.5377
t 4l,5',3
4.5042
adf (t)lt - 6.6531
if 25<t<47
otherwise.
Chapter 1L
c) €!l
-
3
. i
-
g
F
F
I
TA
IA{,
fr
€
s
ttfr
6t
E
c!
cn
Figure 11.15 Comparison of the best of generation 15 with ADFs (whose fitnes is 98.,12) with
the correct impulse response furrction.
o25
Generation
Figure 11.16 Fitness of best-of-generation program withADFs.
Figure 11.17 compares the genetically produced best-of-generation impulse
response function from generation 50 and the correct impulse resPonse. As
can be seen, the two curves are subtantially similar.
The performance of the genetically produced best-of-generation impulse
response functions from generatiors 0, 15, and 50 can be appreciated by
examining the response of the system to the square input.
Figure 11.18 compares the plant response (which is the ffime in all three
panels of the figure) to the square input and the response to the square input
using the best-of-generation impulse response functions from generatiors 0,
1.5, and 50. As can be seen in the first and second panels of this figure, the
performance of the best-of-generation programs from generations 0 and 15 is
not very god, although generation 15 is considerablybetter than generation
0. Howevex, as can be seen in the third panel of this fiSt", the performance of
the best of generation 50 is close to the plant response (the total squared error
being only 11.38 over the 60 fitness cases).
The number of hits improves from 14 for the best-of-generation program
of generation 0, to25 for generation L5, and to 41 for generation 50.
The ability of the genetically evolved impulse response function to generahz,e can be demonstrated by corsidering the same four additional forcing
+ Generation 15 Impulse Response
Time
327 Finding an Impulse Response Function
q)
d
v
!|
-€.-
-
A
-
F
I
20 Time 40 60
Figure 11.17 Comparison of the best of generation 50 with ADFs (whose fibress is 11.38) with
the correct impulse response function.
functions to the system - the ramp input, the unit-step input, the shorter
unit-square input, and the noise signal. Note that since we are operating
in discrete time, there is no generalization of the system in the time
domain. That is, there is no need to simulate the system with a finer temporal granularity.
Figure tl.Ig shows the plant response to the ramp input and the response
to the ramp input using the best-of-generation programs from generations 0,
15, and 50. As can be seery the performance of the best of generation 50 is
close to the plant response for the ramp input (the total squared error being
only 5.38 over the 60 fitness cases).
Figure L1'.2A compares the plant response to the step input and the response
to the step input using the best-of-generation programs from generatioffi 0,
15, and 50. As can be seen, the performance of the best of generation 50 is also
close to the plant response for the unit-step input (the total squared error
being only 19.86 over the 60 fihress cases).
Figure II.ZL comPares the plant response to the short unit-square input
and the response to the short unit-square input using the best-of-generation
programs from generations 0, 1-5, and 50. The performernce of the best of generation 50 is also close to the plant response for the short unit-square (the total
squared error being only L2.59 over the 60 fibress cases).
Figure t1'.zzcompares the plant response to the noise signal (figure 11.11)
and the resPonse to this noise signal using the best-of-generation programs
from generations 0, L5, and 50. As can be seery the performance of the best of
generation 50 is also close to the plant response for the noise signal (the total
squared errorbeing only 6.40 over the 60 fitness cases). The total square error
is L7.I7 for generation 0 and 10.5d for generation 15.
The hits histogram is a useful monitoring tool for visualir^gthe progressive leaming of the population as a whole during a particular fl.rn. The horizontal axis of the hits histogram represents the number of hits (0 to 60) while
the vertical axis represents the number of individuals in the population (0 to
4,000) scoring that number of hits.
Correct Impulse Response
+ Generation 50lmpulse Response
328 Chapter 11
q)
16
-
*)
aF
F
a
(l)
E!t ...)
-
a
F
F
E
q)
+J
a -
-
ar -
F
F
E
Plant Response
-a- Generation 0
Time
Plant Response
+ Generation 50
0 20 Time 40 60
Figure 11.18 Response of the best-of-generation programs from generations 0, 15, and 50 to
'the
square input withADFs.
Plant Response
+ Generation 15
Finding an Impulse Response Function
q)
E
v
!a
t)
. I
x
F
F
E
q)
-a
-
-
+)
-
x
F
I
q)
-1
t
-
!)
-
A
-
?1
FPlant Response
-.- Generation 0
Time
Figure 11.19 Response of the best-of-generation programs from generations 0, 15, and 50 to
the ramp input with ADFs.
330
Time
Plant Response
+ Generation 15
Plant Response
+ Generation 50
Time 40
Chapter 11
q)
-
.J
tara
. I
Fr -
F
F
-
q) 'tt
,ara
x
ai
F
I
q)
FI
)
I
-
A
x
F
F
E
Plant Response
-a- Generation 0
Time
Plant Response
-a- Generation 15
Plant Response
-+ Generation 50
0 20 Time 40 60
Figure 11.20 Response of the best-of-generation programs from generations 0, 15, and 50 to
the unit-step input withADFs.
Finding an Impulse Response Function
q)
-a
rt
*r
a -
A
-
A
F
E Plant Response
-<} Generation 0
q)
Ert
tra
-
er -
s
F
I Plant Response
+' Generation 15
q)
E
-
+J
A
-
F
F
I
Plant Response
'a- Generation 50
0 20 Time 40 60
Figure 11.21 Response of the best-of-generation programs from generations 0, 15, and 50 to
the short unit-square input withADFs.
332 Chapter 11
q)
E!|
-
+J
a l
-
g
A
F
I
(u
-l
!a
-
'1.)
-
-
d
:
(u
-
*r
-
A
E
F
E
Plant Response
{- Generation 0
Plant Response
-+ Generation 15
Plant Response
- Generation 50
0 20 Time 40 60
Figure 1L.22 Response of the best-of-generation programs from generations 0, 15, and 50 to
the noise signal withADFS.
Time
Finding an Impulse Response Function
2,500
I
c)
I
I,it
2,500
9
c)
q)
L
laEr
2,500
c)
c)
I
t
tt?& 60
Figure 11.23 Hits histograms for generations 0, 15, and 50 of the impulse-response problem
with ADFs.
Chapter L1
Figure 11.23 shows the hits histograms for generatiors 0, L5, and 50 of
this run.
r1..7 GENEALOGICAL AUDIT TRAIL WITH ADFs
The creative role of crossover is illustrated by an examination of the genealogical audit trail for the best-of-generation program for generations
15 and 50 of this run. As it happens, the fitness of the best-of-generation
program improves sharply between generations L4 and L5 and between
generations 49 and 50. Specifically, the fitness of the best-of-generation
program is 49.96 for generation L4 and 43.47 for generation t5 and it is
L2.09 for generation 49 and 11.38 for generation 50. The crossover producing the best-of-generation program for generation L5 (involving two parents from generation 14) occurs in the result-producing branch. Moreover,
the crossover producing the best-of-generation program for generation 50
(involving two parents from generation 49) occurs in the functiondefining branch.
11,.7J1,' Crossover in the Result-producing Branch
As previously mentioned, the best of generation 15 has a fihress of 43.47, calls
its aDFO twice, and is shown below:
(progn (defun ADFO (ARGO)
(% (- 7.732 ARGO)
(* (TFLTE -5.295 -3.354 T
(EXPP (* r -5.788)))))
(values G (IFLTE (% T 1.354) (Z
(ADFO r) (% -2.42L
(% r 1-.364) ))).
L.567 )
(- r 6.6s3) (U r r) )
(ADFO r) ) )
The best of generation L5 is one of the offspring resulting from a crossover
involving the result-producing branch of the seventh best individual from
generation 14 (which we will call "parent A" f,or the duration of this discussion of generation 14) artdl97lh$est individual from generation 14 (which
we will call "parentB").
Parent A from generation 14 has 40 points, has a fibress value of 57.20 (i.e.,
not the best of its generation), scores 25 hits, has two calls to its ADF0, and is
shown below:
(progn (defun ADFO (ARGO)
(values (U (- 7.732 ARGO)
(* (TFLTE -5.295 -3.354 T 1.567)
(EXPP (* r -s.7BB))))))
(values (Z (IFLTE (% r 1.364) (Z (- T 6.653) (? T T) )
(ADF r) (z -2.42L (ADF r) ) )
ft (- r 6.653) (% r r) )) ) )
335 Finding an Impulse Response Function
Parent B from generation 14 has 37 points, has a fibress value of 83.74,scores
L8 hits, calls its ADFO twice, and is shownbelow:
(progn (defun ADFO (ARGO)
(values (EXpp (- T ARG0))) )
(values (Z (IFLTE (% r 1-.364) (% (- r 6.653) (% r r) )
(%TT)
(IFLTE (% T 1.364) (EXPP -2.LI3\
-2 .42L (Z _2 .42L r) ) )
(z (- r 6.653) (% r r) )))).
As previously mentioned, the name ADFO refers to the automatically
defined function defined within the particular individual program involved
(i.e., parentA, parent B, and the offspring in generation 15).
Figure 11.24 compares parents 1 and 2 with the correct impulse response
for the system as a function of f. As can be seen, parent A differs significantly
from the correct impulse response between time steps 6 and 16. Howeveq,
parent A resembles the correct impulse response in that it is zero between
times 1 and 6, has a negative hump (albeit smaller) between times !7 and24,
has a positive hump (albeit very much smaller) between times 25 and 40, and
is near zero after time 40.
The crossover points within parents L and 2fromgeneration L4 are both
in the result-producing branches. The best of generation 15 consists of all
of parent A from generation 14 except for its underlined portion. The crossover operation inserts the underlined portion of parent B into parent A
(at the underlined location within parent A) in order to create the best of
generation L5. Specifically, parent A from generation L4 contains the
expression
(z (- r 0.5s3) (% r r) )
in its result-producing branch. This expression is equivalent to
t - 6.653.
Howevet, the best of generation L5 contains the expression
(z T 1.364\
which is equivalent to
t
:.364
Both of these expressions represent straight lines. Moreover, the two
expressions have somewhat similar slopes (1.00 and 0.73, respectively)
and somewhat similar y-intercepts (- 6.653 and zero, respectively). When
these expressions appear in a denominatog they are very similar for large
values of /. The expression from parent B ends up in the denominator
of the offspring best-of-generation program of generation 15 and it
replaces the expression from parent A (which is in the denominator of
parent A).
336 Chapter 11
q)
E
U .
-
:r.-
A
E
F
F
-
q)
Fl
r{
-
-
tF
-+
Correct Impulse Response
Parent A Impulse Response
+
Correct Impulse Response
Parent B Impulse Response
0 20 Time 40 60
Figure 11.24 Correct impulse response function compared to the impulse response functions
for parents A and B from generation 14 with ADFs.
- Fragment from Parent A
+- Fragment from Parent B
Time
Figure 11.25 Comparison of expressions from parents A and B from generation 14 with ADFs.
337 Finding an Impulse Response Function
Figure II.25 shows the reciprocals of both of these linear expressions as a
function of f. The curves represent the expressions from parents A and B from
generation 14. As can be seery for larger values of f, these two curves are
virtually identical, whereas, for small values of f, they differ substantially.
Note that the value of the curve for parent A at time 0 is the consequence of
our definition of the protected division function ?.
The first two panels of figure 17.26 show the square input responses as
a function of f for parents A and B from generation 14. As can be seen,
both parents Aand B are reasonably similar to the plant's response (representing the actual impulse response of the system) after about time 20.
However, parent A from generation 14 differs considerably from the plant
response between about time 10 and time 20, while parent B from generation L4 is much more similar to the plant response during the same period
of time. The third panel of this figure shows that, after the crossover which
created the best-of-generation program for generation 15, the square
input response for those early times is much closer to the plant response
than in generation 14.
hr other words, the effect of the crossover that created the best of generation 1"5 from the two parents from generation 14 is to improve performance
for a portion of the time domain. The difference between these two parents is
sufficient to cause their offspring in generation 15 to have a fitress of 43.47
whereas parent A has a fibres s of 57 .20 and parent B has a fitness of 83.7 4. We
have observed similar "case handling" behavior by the crossover operation
in many other problems, including the Boolean ll-multiplexer problem in
Genetic Programming (subsection 7.4.I) and in the videotape (Koza and Rice
1992a). That is, crossover recombines parts of the structures of the parents so
as to improve fifiress.
11.7.2 Crossover in the Function-Defining Branch
As previously mentioned, the best of generation 50 calls its ADF0 twice, has a
fitness value of 11.38, and is
(progn (defun ADF0 (ARGO)
(values (? (- 6.51-1- T) (* (% (Z (Z (+ r ARGO) (- 5.14i_
-3.67L)) (- 6.511 0.42L)) (* (TFLTE _5.295 _3.354 T
L.567 ) (expp (* r -5.788)))) (EXpp (* r -s.7BB) )))))
(values (Z (IFLTE (% (- r 0.653) (% r 1- r 6.653))) (% r
r.364) (rFLrE (% r r.364) (% (- r 0.6s3) (% r r) ) L.364
(% -2.42L (ADFO r) )) (% -2.421, (ADF0 r) ) ) (t (- r
5.6s3) (% r (% (% r 1.364) (% r (Z r L.364) ))))))).
This individual is one of the offspring resulting from a crossover
involving the function-defining branches of the best of generation 49 (which
we will calT "parent C" for the duration of this discussion of generation
49) and 8Sth-best individual from generation 49 (which we will call
"parent D").
338 Chapter L1.
c)
E
-
-
ta
A
F
-
o)
-
-
.ar.
x
F
E
q)
t a
t
A
E
F
-
Plant Response
-<> Parent B
Plant Response
+ Generation 15
0 20 Time 40 60
Figure 11.26 Correct square input response function compared to the square input response
functions for parents A and B from generation 14 and for the best of generation 15 with ADFs.
339 Finding an Impulse Response Function
Parent C from generation 49 has 107 points and a fihress value of 12.09
(which happens to be tied with the fitness of the best of generation 49).It
scores 37 hits, calls its ADFO twice, and is shor,r.m below:
(progn (defun ADFO (ARG0)
(values (Z (- 6.511 T) (* (Z (Z (Z (+ T ARGO) (- 5.141_
3.671-) ) (IFLTE (- 6.51L (IFLTE T T T ARco) )
(IFLTE (9O T T) (* T ARGO) ARGO (* T ARGO) )
(IFI,TE T T ARGO ARGO) (IFLTE -5.295 -3.354 T
1.557))) (* (TFLTE -5.295 -3.354 T 1.567) (EXpp ( *
r -s.7BB)))) (EXpp (* r -5.788))))))
(values (% (IFLTE (% (- r 0.653) (% t 1- r 6.653))) (% t
L.364) (TFLTE (% r 1.364) (Z (- r 6.5s3) (% r r) ) L.364
(z -2.42L (ADFO r) )) (e" -2.42L (ADF0 T) )) (% (- r 0.653)
(% r 1z (% r r.364) (% r (% r 1.364) ))))))).
Parent D from generation 49 has 168 points, has a fihress value of '14.47,
scores 28 hits, calls its ADF0 once, and is shownbelow:
(progn (defun ADFO (ARGO)
(values (% (- 6.511 T) (* (Z (Z (e" (+ T ARG0) (- 5.141
-3.671)) (IFLTE (_ 6.5]-]- (IFLTETTTARGO)) (IFLTE
(- 5.511 0.421) (* T ARGO) ARGO (* T -5.788) )
(TFLTE T T ARGO ARGO) (- ARGO T) )) (* (TFLTE -5.295
(? (% -9.522 0.42I) (TFLTE (TFLTE T T r ARGO) (TFLTE
(IFLTE T ARGO T ARGO) (* T ARGO) (IFLTE (IFLTE T T
-3.61I ARGO) (+ (rFlte T T ARG0 ARGO) (* T ARGO) )
(IFLTE T T ARGO ARGO) (- 6.511 T) ) (* (- T ARGO)
(IFLTE T T ARGO ARGO) ) ) (IFLTE T T ARGO ARGO) (-
6.511 r) )) r I.561) (EXpp (* r -5.788)))) (EXpp (* r
_5.788) ) ) ) ) )
(values (% (IFLTE (3 T 1.364) (Z (- T 6.653) (% T T) )
(rFLrE (% r 1.364) (% (- r 6.653) (% r r) ) (z r (% ( z
(- r 6.6s3) (% r r) ) (% r (3 r 1.364)))) (- r 0.6s3))
(% -2.42L (ADFO r) )) (Z (- r e .653) (% r (% r
L.364) ))))) .
The first two panels of figure17.27 compare parents C and D from generatton 49 with the correct impulse response for this system as a function of f.
Since both parents C and D are high ranking individuals from an advanced
generation of a successful run, these two individuals are reasonably similar
to the correct impulse response of the system. Nonetheless, both parents C
and D from generation 49 differ from each other and from the correct impulse
resPonse. The third panel of this figure shows that, after the crossover that
created the best-of-generation program for generation 50, the square input
response for those early times is substantially closer to the plant response
than in generation 49.
The crossover points within parents C and D from generation 49 are
both in the function-defining branches. As it happens, both of the fwo
calls to ADF0 in the result-producing branch of the best of generation 50,
Chapter 11
Correct Impulse Response
+ Parent C Impulse Response
Correct Impulse Response
+ Parent D Impulse Response
o 20 Time 40 60
Figure 11.27 Correct impulse response function compared to the impulse response functions
for parents C and D from generation 49 and for the best of generation 50 with ADFs.
o)
-
-
+a .-
-
at
F
E
q)
-,
I
-
{r)
. l
-
A
x
?r
F
:
q)
-
-
*)
,-i
?r
:
Time
Conect Impulse Response
+ Generation 50 Impulse Response
341 Finding an Impulse Response Function
+ Parent C from Generation 49
Offspring from Generation 5l
zu Time 40 60
Figure 11.28 Superimposition of the impulse reponse of parent C from generation 49 onto the
best-of-generation impulse response of generation 50.
both of the two calls to ADFO in the result-producing branch of parent C
from generation4g, and the one call to ADF0 in the result-producing branch
of parent D from generation 49 all use just T as their arguments. Because
of this, we are able to visualize the behavior of al1 three of the functiondefining branches.
Figure 11'.28 provides another visualization of the effect of crossover
by superimposing the impulse response of parent C from generation 49
onto the impulse response of the best of generation 50. The horizontal
arrow near the top of the figure and the vertical arrow near the negative
vertical axis of the figure highlight the slight differences between the
two curves.
Figure 11,.29 shows the difference between the two curves. As can be seery
there is anoticeable difference att = l and t =7. Between t=L6 andt - 60,
the difference is 0.
Figure 11.30 shows the behavior of the function-defining branches of both
parents C and D of generation 49 asa function of f (aDF 0 being called with an
argument of t for these two parents).
The best of generation 50 consists of all of parent C from generation 49
except for its underlined portion. The crossover operation inserts the underlined portion of parent D into parent C, instead of the underlined portion of
parent C, in order to create the best of generation 50. The underlined portion
of parent D is the small subexpression
(- 5.511 0 .42L\
which is equivalent to the numerical constant value 6.090.
342 Chapter 11
c!
a
€)
4. -
I
0 20 Time 40 60
Figure 1,1.29 Difference between parent C of generation 49 and the best of generation 50.
Time
Figure 11.30 Behavior of the function-defining branches of parents C and D from generation
49 when the argument to ADF0 is T.
The underlined portion of parent C (which is replaced to create the best-ofgeneration program for generation 50) consists of
(TFLTE (- 6.511 (TFLTE r T T ARGO) )
(IFLTE (% T T) (* T ARGO) ARGO (* T ARGO))
(IFLTETTARGOARGO)
(rFLrE -5.295 -3.354 r 1.567) ) ) .
WhenARGO is t, this entire expressions reduces to merely T.
Thus, the effect of the crossover in the function-defining branches of
these two parents is to exchange a numerical constant for the variable t in
parent C. The function-defining branch of the best of generation 50 then
reduces to just
- Parentc
+ ParentD
M3 Finding an Impulse Response Function
I
Time
Figure 11.31- Behavior of the function-defining branches of parent C from generation 49 and
the best of generation 50 when the argument to ADF0 is r.
(? (- 6.s11 r )
(* (% (U (% (+ T ARGO) (- 5.r41
5. 0e0 )
(* (TFLTE -5.295 -3.354
(EXPP (*r-s.788)))).
-3.671) )
T L.567 ) (eXpp (* r -s.7BB))))
Figure 11.31 shows the behavior of the function-defining branch of parent
C from generation 49 and the function-defining branch of the best of generation 50 as a function of f. As can be seen, the effect of this change is a small
change for the early time steps.
The first two panels of figure 11.32 show the square input responses as a
function of f for parents C and D from generation4g. Parents C and D differ
from each other and from the plant's square input response. The third panel
of this figure shows that, after the crossover that created the best-of-generationprogramfor generation 50, the square inputresponse for those early times
is closer to the plant response than in generation 49.
The difference between these two parents was sufficient to cause their offspring in generation 50 to have a fihress of 11.38 while parent C has a fibress
of 12.09 and parent D has a fitness of 1,4.47 .
11.8 RESUTTS OF SERIES OF RUNS WITH ADFs
The average structural complexity, Swrtn,of the best-of-run programs from L3
successfnl runs (out of 18 runs) of the impulse-response problem is 1573 potnts
with automatically defined functions.
Automatically defined functions facilitate the discovery of an impulse
resPonse function by reducing the amount of computational effort required
to solve the problem.
Figure 11.33 presents the perfonnance curves based on the L8 runs of the
impulse-response problem with automatically defined functions. The curnulative probabilif of success, P(M,i), is 72%by generations 47 and 50. The
two numbers in the oval indicate that if this problem is run through to
- Parentc
+ Generation 50
344 Chapter L1
€)
-
I
{
{J
F
q)
rFa
l1
F
q)
-
rl
.aJ
A
E
;
F
Plant Response
-a- Parent C
Plant Response
+ Parent D
Plant Response
-.- Generation 50
0 20 Time 40 60
Figure 11.32 Correct square input response function compared with the square input response
functions for parents C and D from generation 49 and for the best of generation 50 with ADFs.
Time
y5 Finding an Impulse Response Function
av\
\Qs\
a
o
q)
I
I
!a
U)
Crro)u
raj
-
r^ -
-
l.l
A . Ft
With Defined Functions
..<-.
(50,727o)
6,000,000
Generation
Figure LL.33 Performance curves for the impulse-response problem showing that
E with = 7 68,000 with ADFs.
Thble 1L.3 Comparison table for the impurse-response problem.
WithoutADFs WithADFs
12,000,000-
q)
a
(t)
q)
I
h
A .
-
o)
A
-
+r)
a
ra
-
.-
.-
rt/
I
-
25 50
l- p,M,D I
l+ I(M, i, z)l
I M=4poo I
I z=997a I
| *'?;- |
(47,72Vo)
Average structural
complexity S
Computional effort E
285.9
7,120,000
757.7
769,000
1,200,000
s
150
Without ADFs With ADFs Without ADFs With ADFs
Figure 11.34 Summary graphs for the impulse-response problem.
ItS= 1.81
346 Chapter 11
generation 4T,processing a total of E*u, -768,000 individuals (i.e.,4,000 x 48
generations x 4 runs) is sufficient to yield a satisfactory result for this problem
with 99% probability.
I1.9 SUMMARY
We have demonstrated the use of genetic programming, both with and withoutautomatically defined functions, to evolve a good approximatioryin symbolic form, to an impulse response function for an unknown time-invariant
linearsystemusing onlythe observed discrete-time response of the unknown
system to a unit-square input.
Table L1.3 compares the average strucfural complexity, S*i,noa1 drrd Swtth,
and the computational effort, Ewithou, md Ewith, for the impulse-response
problem.
Figure Il.Usummarizes the information in this comparison table and shows
a structural complexity ratio, Rs, of L.81, and an efficiency ratio, Rr, of 1.46.
u7 Finding an Impulse Response Function
L2 Artificial Ant on the San Mateo Tlail
The artificial ant problem in this chapter shows that automatically defined
functions can be beneficial when the problem environment contains only a
modest amount of regularity. The amount of regularity present in this problem environment is not as great as that of the Boolean even-parity problem
(chapter 6), the lawnmower problem (chapter 8), or the bumblebee problem
(chapter 9). The regularity that is potentially exploitable by a reusable subprogram in this problem consists of a common inspecting motion that can be
undertaken in only a few directions.
hr addition, ill this problem all the information that is transmitted to the
subprograms is by means of side effects on the state of a system.
I2.']. THE PROBLEM
In this problem, the goal is to find a program for controlli.g the movement of
an artificial ant so as to find all of the food ly*g along a series of irregular
trails on a two-dimensional toroidal grid.
The ant's sensory ability is limited to sensing the presence or absence of
food in the single square of the grid that the ant is currently facing. The ant's
potential actions are limited to tumingfight, tuming left, and moving forward one square.
The San Mateo trail consists of nine parts, each made up of a square 13-by13 grid containing different discontinuities in the sequence of food. The
discontinuities include single and double gaps, comers where a single piece
of food is missing, comers where there are two pieces of food that are missing
in the trail's current directiory and comers where there are two pieces of food
that are missing to the left or to the right of the current direction of the trail.
The original version of this problem was developed and solved by jefferson
et aI. 1991 and Collins and Jefferson L99L using both a finite-state automaton
and a neural network for the fohn Muir trail (Genetic Programming, subsection
3.3.2). A solution to this original version of the problem using genetic programming for the Santa Fe trail is described in Genetic Programming (sectton
7.2).Inaddition, this originalversionof theproblemwas solved using genetic
programming on the more difficult Los Altos Hills trail as described inGenetic
Programmlzlg (section7.Z).This more dfficult trailhas some discontinuities in
which the food is displaced by two squares to the left or to the right of the
trail's current direction.
The San Mateo trail presented here is more difficult than the Los Altos Hills
trailinthatithas discontinuities inwhichthe food is displacedbytwo squares
to the left or right of the trail's current direction and then further displaced by
one square forward. In the Santa Fe trail and the Los Altos Hills trail, all the
different types of discontinuities appeared in a single traif so the problem
had only one fitness case. We precluded trivial tessellating trajectories from
achievinghigh scores on the LosAltos Hills trailby embedding the food in a
large affay of 1,0,000 squares. Unfortunately, this large arraynecessitated running each simulation for a large number of time steps (3,000). This, in turn,
consurned a large amount of computer time. Lr order to solve this more difficult version of the problem within the available amount of computer time,
we divided the trail into parts (fihress cases) and distributed the different
types of discontinuities among the parts. Trivial tessellating trajectories could
then be precluded with disproportionately fewer empty squares. To further
save computer time, we did not make the L3-by-13 grid toroidal. hrstead, the
border of each 13-by-13 part of the trail is electrified so as to immediately
terminate the current fibress case should ant ever wander into the electric
fence.
Figure L2.1 shows the nine parts of the San Mateo trail. Food is represented
by solid black squares. The starting point of the ant within eadr part is in the
middle of the top row (denoted by a small circle). The ant faces south at the
start of each finress case. There areg6pieces of food in the trail as a whole. For
convenience of illustration, gaps in the trail are indicated by gruy squares;
howeveq, the antcannot distinguishbetween gray squares and white squares.
In the first of the nine parts of the trail,the only discontinuities in the trail are
one single gap and one double gap. hr the second part, there are two single
gaps at comers, one double gap at a come{, and two additional single gaps.
The third part contains an instance of the most dfficult discontinuity, namely
where food is missing in the current direction of the trail, where there is no
food to the left (or tighQ of the current direction of the trail, and where the
trail resumes two squares to the left (or right). Although this figure gives us
global knowledge of the trail, the ant's sensors give it only a very narrow
local view of its world. Since there is no food on the gray squares, the ant
need not actually visit them.
r2.2 PREPARATORY STEPS WITHOUT ADFs
RIGHT, LEFT, and uOvn are operators that take no explicit argurnents but
have side effects on the state of the system.
RrGHT changes the orientation of the ant by tuming the ant to the right
(clockwise) by 90' (without moving the ant).
LEFT similarly changes the orientation of the ant to the left (counter-clockwise).
350 Chapter 12
MOVE moves the ant forward in the direction it is currently facing. When an
ant moves into a square, it eats the food, lf Nty,in that square (thereby removing that piece of food from that square). If there is no food, execution of the
program continues; howeveq, if there is food, the eating of the food throws
the execution of theprogrambacktoitsbeginning. Thus, MOVEhaside effects
on both the state of the ant and the state of the trail.
In accordance with our usual convention in this book, zero-argrment sideeffecting functions are treated as terminals. Thus, the terminal set, { for this
problem consists of
,f= { (RrcHT), (LEFT), (MOVE) }.
The function set, f, consists of
F= { rF-FOOD-AHEAD, PROGN}
with an argument map of
{2,2}.
IF-FOOD-AHEAD permits the ant to sense the single adjacent square in the
direction the ant is currently facing. This conditional branching operator takes
two arguments and executes the first argument if (and only if) there is currently food in the single adjacent square in the direction the ant is currently
facing, and executes the second argument if (and only if) there is currently no
food in that square. This conditional branching operator is implemented in
LISP as a macro as follows:
1 #+.FT lqol- f qrzq.inhil-rit-diqnlaa'i nn-fl:a f \
vJ s. lrlrrt tuvtrrY !ru:j L /
2 (defmacro if-food-ahead (then-argument else-argn:ment)
3' (if *food-directly-in-front-of-ant-p*
4 (eval',then-argument)
5 (eval ',else-argument) ) ) .
As canbe seen on line 2 of this macro definitiory two arguments are supplied
to the macro: the then-arg.ument and the else-argument. On line 3 the
first argument of the if operator is the predicate * f ood-directly-inf ront -o f -ant -p*, which evaluates to T if uneaten food is present directly
in front of the ant, but which otherwise evaluates to NrL. This predicate acquires its value after a calculation involving the ant's current facing-direct ion and the current food status of the two-dimensional grid. If food is
present, the if operator causes the evaluation of the then-argument on
line 4, using the LISP evaluation function eval-. If food is not present, the i f
operator causes the evaluation of the e 1 s e - argument on line 5 using eva l.
Additional details are in Genetic Programming (subsection 6.1.1). Macros are
similarly used to implement the rF-oBSTACLE in section 13.2, the rF-MrNE
operator in section 14.2,the rF operator in section L5.2, the rFLTE operator in
section 18.5.1, and the IFLTZ operator in section 18.10.
The ant's goal is to eat as much food as possible from the nine parts of the
San Mateo trail.
351 Artificial Ant on the San Mateo Trail
SinlIle gap
Double lp
Comer - one missing
Corner -
two missing
I
Comer -
to left ol
I I t l
two missing
current trail
o
@
352 Chapter 12
o
Figure 12.1 The nine parts of the San Mateo trail for the artificial ant problem.
The raw fibress of a particular program is the number of pieces of food
(from 0 to 96) eaten over the nine parts of the trail. Only the total number of
pieces of food accumulated over all nine fitness cases is available to genetic
programming.
A loop causes repeated invocation of a program until either time runs out
or the ant succeeds in eating all of the food in the current part of the trail. The
movement of the ant is terminated on anyparticular part of the trailwhen the
ant touches the electrified outer boundary of the 13-by-13 grid or when it has
executed a total of 120 RrGHT or LEFT tums or 80 MOVEs for the current part
of the trail. The amount of food eaten up to the time of termination on each
part of the trail is accumulated over the nine parts of the trail.
Standardized fibress is the total amount of available food (i.e.,96) minus
the raw fifiress.
353 Artificial Ant on the San Mateo Trail
Thble 12.1 Thbleau withoutADFs for the artificial ant problem.
Objective: Find a program to control an artificial ant so that it can
find all 96 pieces of food located on the San Mateo trail.
Terminal set
without ADFs:
(RIGHT), (LEFT), and (MOVE)
Function set
without ADFs:
IF- FOOD-AHEAD ANd PROCN.
Fibress cases: 9 fihress cases, each consisting of a 13-by-13 grid with
food in some squares.
Raw fihress: The sum, over the nine fitress cases, of the food eaten
within the allowed amount of time for each fitress case.
Standardized fihress: The total amount of food (i.e., 96) minus raw fihress.
Hits: Same as raw fibress.
Wrapper:
Parameters: M=4,000.G=51.
Success predicate: A program scores the maximum nurnber of hits.
None.
The version of the problem presented here differs from the earlier versions
of this problem in that there are nine fitness cases (instead of one overall trail);
in that the grid is bounded (rather than toroidal) and touching the boundary
is lethal; and in that the execution of a MoVE onto a square containing food
throws execution of a program back to the beginning of that program.
Table L2.L summarizes the key features of the artificial ant problem for the
San Mateo trail without automaticallv defined functions.
12.3 RESULTS WITHOUT ADFs
The following 95-point individual collecting 96 (out of 96) pieces of food
emerged on generation 13 of one run:
(pRocN (IF-FOOD-AHEAD (PROGN (rF-FOOD-AIIEAD (MOVE) (RTGHT) ) o'
(RIGHT) ) (LEFT) ) (]F_FOOD.AHEAD (IF_FOOD_AHEAD (IF-FOOD_AHEAD
(MOVE) (RTGHT) ) (MOVE)) (PROGN (PROGN (MOVE) (RTGHT) ) (PROGN
(IF-FOOD-AHEAD (IF-FOOD_AHEAD (PROGN (MO\E) (RIGHT) ) (PROGN
(PROGN (IF-FOOD-AHEAD (IF_FOOD_AHEAD (LEFT) (LEFT) ) (PROGN
(LEFT) (MOVE) )) (PROGN (IF-FOOD-A}IEAD (MOVE) (RTGHT) ) (PROGN
(RIGHT) (LEFT) ) ) ) (PROGN (PROGN (PROGN (PROGN (LEFT) (MOVE))
(IF-FOOD-AHEAD (RIGHT) (LEFT) )) (PROGN (IF-FOOD_AHEAD (LEFT)
(RIGHT) ) (PROGN (LEFT) (LEFT) ))) (IF-FOOD_AHEAD (PROGN (PROGN
(MOVE) (MOVE) ) (rF-FOOD-AHEAD (MOVE) (MOVE) )) (rF-FOOD-AHEAD
(MOVE) (MOVE) ))))) (PROGN (PROGN (PROGN (PROGN (LEFT) (MOVE) )
(IF-FOOD-AHEAD (RIGHT) (LEFT) ) ) (PROGN (IF-FOOD-AHEAD (LEFT)
(RTGHT) ) (PROGN (LEFT) (LEFT) ) )) (IF-FOOD-AHEAD (MOVE) (Lnrrl I 1 I
(IF-FOOD-AHEAD (PROGN (MOVE) (MOVE) ) (PROGN (PROGN (RIGHT)
(MovE) ) (MovE) )))))).
Chapter 12
Without Defined Functions
-
q)
(a(n
0)
I
l.r
A .
-
q)
-,
+)
(h
A
-
-
-' I
.-
.-
-l' U
I
-
-
1
a
0
q)
I
I
t
J
(h
CH
>)
I
.-
-
o -
t
A
fr
A .
l-l
G
(10,I9Vo)
,--
i\
i Q6,69Vo)
\
/\
2,000,000
S
(54,85Va)
1,000,000
25
Generation
Figure L2.2 Performance curves for the artificial ant showing that Ewithout = 272,000
withoutADFs.
The average stmcfural complexity, Sritnout, of solutions to the artificial ant
problem over 22 successful runs (out of 26) is 90.9 points without automatically defined functions.
Figure 12.2 presents the performance curves based on the 26 runs of this
problem without automatically defined functions. The cumulative probability
of success , P(M ,i) ,
is 69%by generation L6 and is 85%by generation 50. The
two numbers in the oval indicate that if this problem is run through to generation L6, processing a total of Ewithout -272,000 individuals (i.e., 4,000x17
generations x 4 runs) is sufficient to yield a solution to this problem with99%
probability.
See also Koza1993b.
r2.4 PREPARATORY STEPS WITH ADFs
hr applyurg genetic programming with automatically defined functions to
the artificial antproblem, we decided thateachindividual overallprogram in
the populationwould consist of one function-definingbrandr defining azeroargument function called ADF0 and one result-producingbranch.
We first consider the function-defining branch.
The terminal set, 'Taaf, for the zero-argument defined function ADFO consists of
,ladf= { (RTGHT), (LEFT), (MOVE) }.
The function set, fadf, for the zero-argument defined function ADFO is
fodf= { rF-FooD-AHEAD, PRocN}
with an argument map of
{2,21.
355 Artificial Ant on the San Mateo Trail
Thble L2.2 Thbleau with ADFs for the artificial ant problem,
Objective: Find a program to control an artificial ant so that it can
find all 96 pieces of food located on the San Mateo trail.
Architecture of the
overall program
with ADFs:
One result-producing branch and one zero-argument
function-defining branch.
Parameters: Branch typit g.
Terminal set for the
result-producing
branch:
(RTGHT), (LEFT),and (UOVE) .
Function set for the
result-producing
branch:
ADF 0, IF -FOOD-AHEAD, and pnOCm.
Terminal set for the
function-defining
branchaopo:
(RTGHT), (LEFT),and (UOVE) .
Function set for the
function-defining
branchappo:
r F - FOoD-AHEAD, and pRocx.
The body of enr'O is a composition of primitive functions from the function set, faA1, and terminals from the termin al set, ,Tadf.
We now consider the result-producingbranch.
The terminal set ,Trpb,for the result-producingbranch is
tpb= { (RIGHT), (r,npt), (MOVE) }.
The function set, frpb, for the result-producing branch is
frpb = {ADF 0, rF -FOOD-AHEAD, pRoGN}
with an argument map of
{0,2,2}.
The result-producing branch is a composition of the functions from the
function set, f*6, and terminals from the terminalset, typb.
Table 12.2 summarizes the key features of the arfficial ant problem for the
San Mateo trail with automaticallv defined functions.
12.5 RESULTS WITH ADFs
hr one run, about half (2,0M of the 4,000) of the individuals in generation 0
score 0 in their search for food over the nine parts of the San Mateo trail.
Many of these individuals turn and look, but are immobile; others tum away
from the trail and run into the electric fence without encountering any food.
Another 20% (868) score 18 out of 96 because there are, over the nine parts of
the trail, L8 pieces of food available to a program that merely moves south
356 Chapter 1,2
100
ttt
a
q)
I
ia
?i.
R50
L
a
0
Worst of Generation
-+ Average
-+ Best of Generation
o Generation
Figure 12.3 Fitness curves for the artificial ant problem with ADFs.
whenever food is present to the south. About 1/" of the 4,000 individuals
score between 54 and72.
Figure 12.3 shows,by generation, the fibress of the best-of-generation program. As can be seery the fib:ress of the best-of-generation program and the
average fihress of the population as a whole both tend to improve (i.e., drop)
from generation to generation.
Figure 12.4 shows the hits histograms for generations 0,2,5, and7 of this
run. The " slinlcy" leftto-right undulating movement of both the high point
and the center of mass of these histograms reflects the improvement of the
population as a whole.
Figure 12.5 shows the structural complexity curves for this run of the artificial ant problem with automatically defined functions. The figure shows, by
generatiory the structural complexity in the best-of-generation program and
the average of the values of structural complexity for all the programs in the
population.
In generattonT of this run, the following '1.00"h-correct solution to the artificial ant problem emerged:
(progn (defirn ADFO o
(values (PROOJ (rF-FOOD-AHEAD (IF-FOOD-AIEAD (MOVE)
(RIGHI) ) (PNOGII (LEFT) (MOVE) )) (PROG\T (IF-FOOD-A}FAD
(]F-FOOD-AIEAD (MOVE) (LEFT) ) (pROA\j (PROOJ (Rrcr{r)
(LEFT) ) (PROq\ (LEFT) (MOVE) ))) (IF-FOOD_AIfiAD (LEFT)
(Rrclrl.) ) ) )
(VAIUCS (PROG'J (PROG}J (MOVE) (ADFO)) (PROA\T (rF-FOOD-AFIEAD
(MOVE) (MOVE) ) (PROG\T (ADFo) (ADFO))))) .
Lr this Program, ADFO is invoked three times from the result-producing
branch. The result-producing branch serves to reposition the ant just before
the first and second invocations of ADFO.
Figure 12.6 shows the trajectory for the ninth fihress case of the ant for this
run. In this figure (and the following figures in this section), the light lines
represent movements initiated by the result-producing branch of the proSram; the heavy lines indicate movements initiated bv the automaticallv
357 Artificial Ant on the San Mateo Trail
2500
>-)
I
q)
I
72-83 84-95 %
72-83 8+95 %
0-11 t2-23 U-35 3647 48-59 60-71 72-83 8+95
Hits
Generation 7
0-11 12-23 24-35 3647 48-59 6U7r 12-83 M-95 %
Hits
Figure 1,2.4 Hits histogram for the artificial ant problem with ADFs.
3G47 48-59 6U7l
Hits
Generation 2
3647 48-59 ffi-7r
Hits
2500 Generation 5
>>
I
€)
I
-
1
2500
>-)
9
c)
o
L
-
Chapter 12
Best of Generation
+ Average
o Generation 7
Figure 12.5 Structural complexity curves for the artificial ant problem withADFs.
Figure 12.5 Trajectory of the artificial ant for the ninth fitness case of the best-of-run progriun
from generation 7 with ADFs.
P
l- -'1
\
L -1
r
J
-
l_ J
1- t r
-l
L J
359 Artificial Ant on the San Mateo Trail
defined function ADF0. The figure is suggestive of the reuse of a semicircular
counterclockwise inspecting motion.
The best-of-run individual from generationT of run L can be simplified
to the following:
(progn (defun ADFO o
(values ( IF-FOOD-AIAAD
(MOVE )
(PROGN (LEFT) (MOVE)
(IF_FOOD-AHEAD
(values
(MOVE ) i c
(PROGN (LEFr) (MOVE) rd
(IF_FOOD_AHEAD (LEFT)
(RrcHr) )))))))
(pRocN (MovE) (ADFO) (MOVE) (ADF0) (ADFO))))
Figure L2.7 shows the trajectory of the artificial ant executing this semicircular counterclockwise inspecting motion specified by the best-of-run individual from generation7 of run L. For simplicity, this figure shows only part
of the 13-by-13 grid and contains food in only four squares. As usual, the ant
starts at the circle in the top row.
Since the ant encounters food on each of its first four downward movements, evaluation of the program terminates upon execution of the first MOVE
operation (labeled 1) in the result-producing branch. The four places on the
trajectory where this occurs are also labeled L.
The remainder of the trajectory shown in figure 12.7 rcpresents three evaluations of the program. These three executions occur in the absence of any
food. Each circle denotes the ant's exit from one invocation of eop0. The two
small, filled circles (labeled E) denote the ant's exit from the first and second
of the three evaluations of the program. The large filled circle denotes the
ant's exit from the third evaluation of the program.
The lines labeled 2 in figure I2.7 denote movements caused by the second
MoVE operation of the result-producing branch.
Points in the figure labeled with capital letters (P, Q, or R) denote invocations of anr'O by the result-producing branch.
Atl the bold lines in the figure denote movements caused by the MOVE
operations on linesb and d of anPo.
This solution is a hierarchical decomposition of the problem. Genetic programming discovered a decomposition of the overallproblem into a reusable
subroutine for performing an inspecting motion. Genetic programming
simultaneously evolved the sequence of sensor tests, tums, and moves to
implement this inspecting motion. Finally, genetic programming simultaneously evolved stage-setting sensor tests, fums, and moves and assembled
the inspecting motions into a solution of the overall problem.
The above program for the artificial-ant problem illustrates two of
the five ways itemized in chapter 3 in which the hierarchical problemi a
; b
Chapter 1,2
?
I
I
liil: i-:
!$
I R
-1
fiq
2
r
t'; ffi
ffi ti 2
r
lo
ol- HJ
- E.I J
R
L
a
J
Figwe 12.7 Trajectory of the artificial ant for the ninth fitness case of run L, showing its semicircular counterclockwise inspecting motion with ADFs.
solving approach can be beneficial: hierarchical decomposition and identical reuse.
Hierarchical decomposition is evidentbecause the overall program for solving the problem consists of an automatically defined function, ADFO, as well
as a result-producing branch.
In addition, the three times that the result-producingbranch invokes ADFO
illustrate the identical reuse of the solution to a subproblem.
Of course, genetic programming produces a variefy of different programs
in different runs. For example, in a second successful run of this problem, all
of the ant's actual movements are controlled by ADFO, as opposed to the result-producingbranch. An examination shows that there are no MOVE operations in the result-producing branch of the 44-point program scoring 96 (out
of 96) from generation 5 of run 2:
(progn (defun ADFO o
(va]ues (PROGN (PROGN (PROGN (PROGN (PROGN (MOVE)
(MOVE) ) (IF-FOOD-AHEAD (MO\rE) (LEFT) )) (PROGN (PROGN
(rF-FOOD-AHEAD (rF-FOOD-AHEAD (MOVE) (RrcHT) ) (rr'_
FOOD-AHEAD (MOVE) (LEFT) ) ) (IF-FOOD-AHEAD (RIGHT)
(pRocN (LEFr) (LEFr) ))) (rF-FOOD_AHEAD (MOVE)
(RrcHT) ))) (PROGN (MOVE) (RrcHr) )) (IF_FOOD_AHEAD
(MOVE) (PROGN (RrGHr) (MO\rE) )))))
361, Artificial Ant on the San Mateo Trail
?
,
I
IL -l tffi
il > :i$i
! u
"{l
s I
,li
362
Figure L2.8 Trajectory of the artificial ant for the ninth fitness case of run2, where all movements are controlled byADF0.
(values (PRocN (PRocN (PRocN (ADFO) (LBFT) ) (ADFO))
(PROGN (RrcHr) (LEFT) )))) .
Figure 12.8 shows the trajectory for the ninth fitness case of the ant where
all movements are controlled by ADFO. This entire trajectory is shown with a
line.
In a third successful run, the following 34-pointprogram scoring 96 (out of
96) emerged in generation 8:
(progn (defun ADF0 o
(values (PROGN (PROGN (PROGN (IF-FOOD-AHEAD (MOVE)
(LEFr) ) (PROGN (MOVE) (rF-FOOD-AHEAD (MOVE) (LEFr) ) ))
(IF-FOOD_AHEAD (MOVE) (RIGHT) ) ) (PROGN (IF-FOOD-AHEAD
(MOVE) (RIGHT) ) (RTGHT) ))))
(values (PROGN (IF-FOOD-AHEAD (MOVE) (LEFT) ) (PROGN
(PROGN (MOVE) (PROGN (ADFO) (MOVE))) (PROGN (PROGN
(ADFO) (MovE) ) (MovE) ))))).
Figure 12.9 shows the trajectory for the ninth fitness case of the ant while it
is under the control of this 34-point program from run 3. In this solution, the
food is primarily eaten under the control of the result-producing branch
(although the ant spends about half of its time under the control of the function-defining branch).
The average structural complexitf, Switn, of solutions to the artificial ant
problem over 19 successful runs (out of 19 runs) is7l.7 points with automatically defined functions.
Figure 12.L0 presents the performance curves based on the 19 runs of the
artificial ant problem with automatically defined functions. The cumulative
Chapter 12
I
I'
-lL
I
I
)
-''l
'H t\ l.t
i{
l: . tl:, I
i:$ m
J L
L
(
I
il t\
Figure 12.9 Trajectory of the artificial ant for the ninth fitress case of run 3, where movements
are primarily controlled by the result-producing branch.
With Defined Functions
-
'q)
U)
a
q)
I
tr
A .
-
c)
-
.Fa
a
A
!a
.-
U
l4-
f-l
00
s0
0'
I
a
a
q)
I
I
!a
a
CH
h€
-
-
-
-
fr
A . E
(000,000
(50, 1007o)
1.000.000
;
'),,*, ' l
25
Generation
.._--F0
Figure 12.10 Performance curves for the artificial ant showing that E*rp = 136,000 with ADFs.
\
(16,95Vo)
t
i
l
I
I
I
I
l- pul) I
l* I(u, i, rll
lN=19
W
|
16 E = 136.000
363 Artificial Ant on the San Mateo Trail
Thble 12.3 Comparison table for the artificial ant problem.
Without ADFs WithADFs
Average strucfural
complexity S
Computional effort E
90.9
272,000
71,.7
136,000
Without ADFs WithADFs WithoutADFs With ADFs
Figure 12.11 Summary graphs for the artificial ant problem.
probabiliV of success/ P(M,l), is 95%by generation 16 and is 100% by generation 33. The two numbers in the oval indicate that if this problem is run
through to generation L6, processing a total of Ewith = 136,000 individuals
(i.e.,4,000 x 17 generations x 2 runs) is sufficient to yield a solution to this
problem with 99"/o probabiliry
12.5 SUMMARY
Table L2.3 compares the average strucfural complexi tft 3 withour and 3 *;6 , and
the computational effort, Ewithout and Ewith, for the arfficial ant problem with
automatically defined functions and without them.
Figure l}.llsummarizes the information in this comparison table and shows
a structural complexity ratio, Rs, of 1..27 and an efficiency ratio, RB, of 2.00.
364 Chapter L2
13 Obstacle-Avoiding Rob ot
As previously mentioned, one of our design considerations in creating the
lawnmower problem was that it be amenable to scaling both in terms of the
size of the grid andthe complexityof theproblemitself. ChapterShas already
explored the scaling of the lawnmower problem along the axis representing
lawn size. The obstacle-avoiding-robot problem considered in this chapter
scales the lawnmower problem along the axis of problem complexity. The
environment of this problem is more complicated in that obstacles disrupt
the homogeneity of the grid and prevent the straightforward exploitation of
the environment.
13.1 THE PROBLEM
In this problem, an autonomous mobile robot attempts to mop the floor in
a room containing harmless but time-wasting obstacles (posts). The
obstacles do not harm the robot, but every failed move or jump counts
toward the overall limitation on the number of operations available for
the task.
As was the case in the lawnmower problem, the state of the robot consists
of its location in the room and the direction in which it is facing. Each square
in the room is uniquely identified by a vector of integers modulo 8 of the
form (17), where 0 < i, j s 7. The robot starts at locatio n (4A), facing north. The
room is toroidal, so thatwhenever the robotmoves off the edge of the room it
reappears on the opposite side.
Six non-touching obstacles are randomly positioned in a room laid out on
an 8$y-8 grid. Figure L3.L shows two typical rooms. The origin (0,0) is in the
uPPer left comer. The numbering of the squares increases going down and
going to the right.
The robot is capable of turning left, of moving forward one square
in the direction in which it is currently facin g, and of jumpin gby aspecified displacement in the vertical and horizontal directions. Whenever
the robot succeeds in moving onto a new square (by means of either
a single move or a jump), it mops the location of the floor onto which
it moves.
366
Figure 13.1 TWo rooms, each with six posts, with the obstacle-avoiding robot in its
starting location.
13.2 PREPARATORY STEPS WITHOUT ADFs
The operators for this problem are similar, but not identical, to the operators
in the lawnmower problem.
The operator MOp takes no arguments and moves the robot in the direction
it is currently facing and mops the location of the floor onto which it is moving. MOp does not change the orientation of the lawnmower. To ensure
closure, MOP retums the vector value ( 0 , 0 ) . When the MOP operator attempts
to move the robot to a location occupied by an obstacle, the robot does not
move; however, the attempted uop counts toward the overall limit on the
number of operations thatmaybe executed in just the same way as a successful uop does.
FROG is a one-argument operator that causes the robot to move relative to
the direction it is current$ facing by * amount specified by its vector argument. FROG does not change the orientation of the lawnmower. To ensure
closure, FROG acts as the identity operator on its argument. If the FROG
operator attempts to move the robot to a location occupied by an obstacle, the
FROG fails in the same way as the MOP operator.
The operator LEFT takes no argurnents and is identical to the LEFT operator of the lawnmower problem. It changes the orientation of the robot by
tuming the robot to the left by 90" (without moving it). To ensure closure,
LEFT retums the vector value (0
, 0 ) .
The two-argument I F - OB S TAC LE conditional branching operator executes
its first argument if an obstacle is immediately in front of the robot in the
direction the robot is currently facing, but otherwise executes its second atpment. This operatorenablestherobotto avoid time-wasting attempts to move
to a location occupied by an obstacle. Since there are side-effecting functions
in this problem, IF-OBSTACLE mustbe implemented as a macro as described
in section 12.2.
Chapter L3
vBA is the two-argument addition function for vectors of integers modulo
8 and is identical to the vBA function of the lawnmower problem.
The terminal set for this problem consists of the two side-effecting zeroargument operators and random vector constants modulo 8,9iv8.
,T= { (LEFT), (MOP), sug}.
The function set consists of
f= {]F-OBSTACLE, VBA, FROG, PROGN}
with an argurnent map of
{2,2, r,21.
TWo fitress cases are used for this problem. With six obstacles (and 58
unobstructed squares) in the room for each of the two fitness cases, raw fitness ranges between 0 and L16. Aprogram in the population is executed once
for each fitness case. The movement of the robot is terminated when the robot
has executed either 100 lnrr tums or L00 movement-causing operations (i.e.,
a MoP or FRoG) for a particular fitness case. Execution of rr-oeSTACLE,
(LEFT ) , PROGN, and VBA do not count toward this limit.
The contribution to raw fitness of a program by a particular fibress case is
the number of squares (from 0 to 58) mopped within the allowed time. The
raw fihress of a program is the surn, over the two fibress cases, of the number
of squares mopped. Only the total number of squares mopped over both fitness cases is available to genetic programming.
The use of numerous fitness cases is desirable for this problem in order to
avoid overspecialization of the evolved programs to a particular arrangement
of obstacles. Each run of this problem is fairly time-consuming. As usual,
many runs of a problem must be made, both with and without automatically
defined functions, in order to compute the structural complexity ratio for the
problem and to make the performance curves that yield the efficiency ratio
for the problem. The goal of exploring whether automatically defined functions facilitate automated problem-solving is more important to us in this
book than the goal of finding the very best solution or most general solution
to a particular problem. These competing goals dictate that a compromise be
made for this problem. We decided to allocate only enough computer time to
this problem to support two fibress cases.
This problem requires that the robot test for the presence of an obstacle
prior to most (but not necessarily all) of its contemplated moving or jumping
operations. Execution of a test does not count toward the 100 state-changing
operations.
This problem is similar to, but considerably harder than, the lawnmower
problem (where a population of only 1,000 was used). Consequently, a population size of 4,000 is used here.
Because this problem is harder than the lawnmower problem, we defined
mopping Il2 of the 116 squares to be a satisfactory result for this problem.
This change increases the percentage of successful runs and shortens the
367 Obstacle-Avoiding Robot
Table 13.1 Thbleau withoutADFs for the obstacle-avoiding-robotproblem.
Objective: Find a program to control an autonomous mobile robot
so that the robot mops all 58 free squares of the floor in
a room.
Terminal set
without ADFs:
(LEFT), (MOP), md the random constants 9i.,06.
Function set
without ADFs:
rF-OBSTACLE, V8& FROG, and pnOCn.
Fitness cases: TWo fibress cases, each with obstacles in 6 of the 64
squares of the room.
Raw fitness: Raw fibress (from 0 to 116) is the surn, over the two
fitress cases, of the number of squares in the room
mopped by the robot within the allowed amount of
time.
Standardized fibress: Standardized fitress is the total number of squares to
be mopped (i.e., 116) minus raw fibress.
Hits: Same as raw fihress.
Wrapper: None.
Parameters: M=4,000.G=51.
Success predicate: Aprogram scores 112 (out of 116) hits.
368
average length of the successful runs (in generations); howevel, it prevents
direct comparison of the results of this problem and the results of the
lawnmower problem.
Thble L3.L summarizes the key features of the obstacle-avoiding-robot problem without automatically detined functions.
D'haeseleer (1994) uses the obstacle-avoiding robotproblem as one of four
problems for testing his new context-preserving crossover operation. See subsection F.1"3.1 in appendix F.
L3.3 RESULTS WITHOUT ADFs
hr one run without automatically defined functions, the following 330-point
program scoring 112 (out of tl6) emerged on generation 33:
(VBA (PROGN (PROGN (FROG (VBA (FROG (V8A (V8A (PROGN (FROG
(4,5) ) (rF-OBSTACLE (V8A (PROGN (MOP) (MOP) ) (LEFT) ) (PROGN
(MOp) (MOp) ) ) ) (PROGN (VBA (LEFT) (MOP) ) (pnOCN (LEFr) (MOP) )))
(PROGN (V8A (PROGN (MOP) (LEFT) ) (PROGN (LEFT) (LEFT) )) (PROGN
(pRocN (Mop) (LEFr) ) (vga (MoP) (5,s)))))) (MoP) )) (IF-OBSTACLE
(2,2) (MOP) )) (PROGN (FROG (VBA (PROGN (FROG (V8A (V8A (PROGN
(MOP) (MOP) ) (FROG (3,0))) (PROGN (PROGN (VBA (MOP) (MOP) )
(pRocN (4,',l) (pRoGN (MOP) (MOP) ))) (PROGN (LEFr) (LEFT) )))) (v8A
(pRocN (pRoGN (MOP) (LEFr) ) (V8A (MOP) (MOP) )) (PROGN (V8A
Chapter 13
(LEFI) (MOp) ) (rF-OBSTACLE (MOp) (3,6) )))) (MOp) )) (vBA (PROG{ (MOp)
(MOP) ) (FROG (MoP) ) ) ) ) (VBA (VBA (PROQ\T (V8A (MoP) (MoP) ) (PROQ\T (PROG'I
(vBA (PROG\ (4,2) (l4op) ) (PROO{ (2,6) (MOp) )) (V8A (VBA (PROO{ (FROG
(4,5)) (PROG\ (V8A (PROG'[ (MOp) (LEFT) ) (PROC\T (LEFT) (LEFT) )) (pROe\T
(PROQ{ (MOP) (LEFI) ) (PROST (MOp) (FROG (5,0) ) ) ) ) ) (pROCr (FROG (PROOJ
(PROS\T (MOP) (MOP) ) (PROC{ (6,5) (MOp) ))) (PROOJ (LEF|) (MOp) ))) (pROq{
(V8A (V8A (PROST (}4OP) (MOP) ) (LEFT) ) (PROG\ (LEFT) (LEII"I) )) (PROG'[
(PROOT (MOP)
(LEFT) ) (v8A (FROG (v8A (MOp) (1,1) )) (MOp) ) ) ) ) ) (PROGN (v8A
(pRoGN (MOp) (MOp) ) (FROG (MOp) )) (PROGN (PROGN (rF-OBSTACLE
(rF-OBSTACLE (MOp) (LEFT) ) (vBA (MOp) (MOp) ) ) (PROGN (vBA (MOp)
(3,0)) (vga (PROGN (MOP) (MOP) ) (PROGN (MOP) (0,7))))) (FROG
(FROG (vBA (VBA (FROG (VBA (4,1) (Mop) )) (Mop) ) (r'lop) )) )))))
(V8A (V8A (PROGN (V8A (Y6P) (1,1)) (IF_OBSTACLE (5,0) (PROGN
(MOP) (MOP) ))) (PROGN (PROGN (VBA (LEFT) (3,0)) (V8A (FROG
(rF-OBSTACLE (2,3) (MOp) )) (FROG (V8A (FROG (PROGN (PROGN (MOp)
(LEFr) ) (v8A (MOp) (5,5)))) (Mop) )))) (FROG (FROG (3,2)))))
(pRoGN (MOp) (2,3)))) (pRocN (v8A (v8A (MOp) (Mop) ) (uop) )
(PROGN (V8A (VBA (VBA (rF-OBSTACLE (MOP) (MOP) ) (FROG (MOP) ))
(VBA (FROG (V8A (MOP) (1,1))) (MOP) )) (V8A (PROGN (V8A (PROGN
(PROGN (MOP) (MOp) ) (rF-OBSTACLE (rF*OBSTACLE (V8A (LEFT) (6,4))
(pRocN (2,4) (LEFT) )) (PROGN (rF-OBSTACLE (0,5) (MOp) ) (pnocu
(MOp) (MOp) )))) (V8A (6,4) (LEFr) )) (FROG (6,0))) (PROGN (PROGN
(v8A (PROGN (MOP) (LEFT) ) (PROGN (LEFr) (LEFr) ) ) (v8A (MOp) (v8a
(PROGN (MOP) (LEFT) ) (vga (MOp) (1, 11 I I I I (pRocN (pRocN (Mop)
(MOp) ) (PROGN (6,5) (MOp) ) ) ) ) ) (v8A (FROG (FROG (VBA (MOp)
(1,1) ) ) ) (v8A (v8A (PROGN (MOp) (MOp) ) (rF-OBSTACLE (1,5)
(LEFT) )) (rF-OBSTACLE (IF-OBSTACLE (MOp) (LEFT) ) (Vga (MOp)
(MoP))))))))).
As one would expect, this best-of-run progxam consists of a tedious sequence
of irregular movements, jumps, and turns that eventually mops 112 ofthe LL6
squares of the room. It also contains a sufficient number of tests for obstacles
to permit the attainment of this score of 112 within the constraints on the
number of operations.
Figure 13.2 shows, for the first fitness case, the partial trajectory traced by
the robot while it is under the control of this 330-point best-of-run program
for operations 0 through 30; figure 13.3 shows the partial trajectory for operations 30 through 60; and figure 13.4 shows the partial trajectory for operations
60 through 91.
Even though the problem environment contains considerable regularity, this 330-point program without automatically defined functions necessarily operates in a irregular and haphazard fashiory with no common
approach visible among the various parts of the overall 9l-operation
trajectory.
The average strucfural complexi W, 3 *;tnout, of the best-of-run programs from
the seven successful runs (out of 10 runs) of the obstacle-avoiding robot without automatically defined functions is 336.1 points.
369 Obstacle-Avoiding Robot
Figure 1,3.2 Partial trajectory of the obstacle-avoiding robot executing the 330-point program
for operations 0 through 30 withoutADFs.
Figure 13.3 Partial trajectory of the obstacle-avoiding robot executing the 330-point Program
for operations 31 through 60 withoutADFs.
I
I
I
I
I
I
la)
t"-
I
----J--
38
I
I
--J---
37L
370 Chapter 13
ii--l--6
-l-;--i ----T-----[-' -1
65
I
I
I
i-----
I
I
I
it
ia-P
'rl I
I -2,
7
4/ #ii /r ;v/
4
91 o Ll861-
ffi
Lct I
;-{ "
89 * l 1
I
l s
I
l n 76 75 tq/ t5 / T
v_ 78
I
I
r--
I
I
I
I
8 / |_-,]D/
6
Figure 1.3.4 Partial trajectory of the obstacle-avoiding robot executing the 330-point program
for operations 61 through 91 without ADFs.
Without Defined Functions
02550
Generation
Figure 13.5 Performance curves for the obstacle-avoiding-robot problem showing that
E without = 784,000 without ADFs.
-
q)
a
a
q)
I
L
A
H
q)
+l
a
-
Fl
.-
. l
FI
-
I
-
bq
a
a
q)
I
I
i
)
a
lH
>-) *.
.!l
-
-
G
,-
f.. A .
E
7,000,000
(50,70Va) l- P,M" I
l+ I(M, i' z)l
I M = 4:oool
I z=997o I
| ft'?a- |
(33,l0%o)
Obstacle-Avoiding Robot
372
Figure L3.5 presents the performance curves based on the 10 runs of the
obstade-avoiding-robot problem without automatically defined functions. The
cumulative probability of success, P(M,i),is7}%by generations 48 and 50.
The two numbers in the oval indicate that if this problem is run through to
generation 48, processing a total of E.,,oou, =784,000 individuals (i.e., 4,000 x
49 genentions x 4 runs) is sufficient to yield a satisfactory result for this problem with 99/' probability.
13.4 PREPARATORY STEPS WITH ADFs
Ahuman programmer would never consider solving this problem using the
tedious style employed by the genetically evolved program without automatically defined functions in the previous section. Instead, a human progranuner would write a program that first tests a certain small subarea of the
room for time-consuming obstacles in some orderly way and then mops that
small subarea in some orderly way. The human prografiuner would then reposition the robot to a new subarea of the room in some orderly (probably
tessellating)way,and thenrepeat the testing and mopping actions in thenew
subarea of the room. The program would contain enough invocations of the
orderly method for dealing with subareas of the room so as to mop at least
the requisite LL2 squares within the allowed number of operations. That is, a
human progranuner would exploit the considerable regularity of the problem environmentby decomposing the problem into subproblems and would
then repeatedly invoke the solution to the subproblem in order to solve the
overall problem.
In applying genetic programming with automatically defined functions
to this problem, w€ used the same arrangement of ADFs used in the
lawnmower problem. Specifically, we decided that each individual in the
population would consist of one result-producing branch and two function definitions in which ADFO takes no arguments and ADF1 takes one
argument. The second defined function ADFI- can hierarchically refer to
the first defined function ADF0.
Table L3.2 summ afizesthe key features of the obstacle-avoiding-robot problem with automaticallv defined functions.
13.5 RESUTTS WITH ADFs
hr one run of this problem with automatically defined functions, the following 101-point program achievirg u perfect raw fihress of 1L5 emerged on generat:ron2T'.
(progn (defun ADF0 o
(values (PROGN (PROGN (VBA (PROGN (MOP) (MoP) ) (VBA
(PROGN (rF-OBSTACLE (5, 3 ) (MOP) ) (MOP) ) (V8A (LEFT)
(MOP) ) )) (V8A (V8A (LEFT) (LEFT) ) (V8A (LEFT) (1,0) ) ) )
(VBA (PROGN (MOP) (MOP) ) (IF-OBSTACLE (LEFT)
(MoP) ) ))))
Chapter 13
Thble L3.2 Thbleau with ADFs for the obstacle-avoiding-robot problem.
Objective: Find a program to control an autonomous mobile robot
so that the robot mops all58 free squares of the floor in
a room.
Architecture of the
overall program
with ADFs:
One result-producing branch and two functiondefiningbranches, with ADFO taking no arguments and
ADFI taking one argument and with aort hierarchically referring to ADFO.
Parameters: Branch typi.g.
Terminal set for the
result-producing
branch:
(LEFT), (MOP), and the random constants 9t.,rs.
Function set for the
result-producing
branch:
ADF0, ADF1, IF-OBSTACLE, V8A, FROG, and
PROGN.
Terminal set for the
function-defining
branchADFo:
(LEFT), (MOP), and the random constants 9t.,n6.
Function set for the
function-defining
branch ADF0:
IF-OBSTACLE, V8& and pROCU.
Terminal set for the
function-defining
branch ADFI:
ARG0, (LEFT), (UOp), and the random constants S.*6.
Function set for the
function-defining
branch ADFI-:
rF*OBSTACLE, V8& FROG, PROGN, and aOr0
(hierarchical reference to ADF0 by aorf).
(defun ADFl- (ARGO)
(vafues (PROGN (PROGN (PROGN (PROGN (V8A (FROG ARGO)
(PROGN (ADFO) (3,1) ) ) (PROGN (PROGN (IF-OBSTACLE (MOP)
(ADFO)) (V8A (ADFO) (ADFO))) (IF-OBSTACLE (PROGN (0,4)
(ADFO)) (PROGN (PROGN (ADF0) ARGO) (VAa (ADFO) (v8A
(PROGN (PROGN (ADF'O) (V8A (ADFO) (ADFO))) (PROGN (V8A
(v,8A (ADFO) (MOp) ) (v8A (ADFO\ (7,7))) (PROGN (ADFO)
(6,4)))) (FROG ARGO))))))) (PROGN (ADFO) (v8A (ADFO)
(ADFO) ) ) ) (v8A (ADFo) (ADF0) )) (pRocN (v8A (ADF0)
(ADFO)) (VBA (ADFO) (ADFO))))))
(values (VgA (VBA (ADFI- (7,0) ) (ADF1 (ADF]_ (7,0)))) (ADFI
(ADFO))) ) ) .
The success predicate for this problem treats a score of 1t2 as a success for
Pu{Pose of making the performance curves, but runs with automatically
defined functions were permitted to run on in order to achieve a perfect score
of 116.
This 101-point program can be simplified to the following equivalent
57-point program:
373 Obstacle-Avoiding Robot
I
-----l
I
374
Figure 13.6 Trajectory of the robot using the 101-point program for the obstacle-avoidingrobot problemwithADFs
(progn (defun ADFO o
(values (PROGN (MoP) (MOP) (IF-OBSTACLE (5,3) (MoP) )
(MOP) (LEFr) (MOP) (LEFr)
(LEFT) (LEFT) (MOP) (MOP) (IF-OBSTACLE
(LEFT) (MOP) ))))
(defun ADF1 (ARGO )
(values (PROGN (FROG ARGO) (ADFO)
(rF-OBSTACLE (MOP) (ADFO)) (ADFO) (ADF0)
(IF-OBSTACLE (ADFO)
(PROGN (ADFO) (ADFO) (ADFO) (ADFO)
(ADFO ) (ADFO ) (MOP) (ADFO )
(ADFO) (FROG ARGO)))
(ADFO) (ADF0) (ADFO) (enp'o) (ADFO)
(ADFO) (eoro) (epFo) (ADFO))))
(values (progn (ADFI- (1 ,01) (anFf (ADFI (7,0)) ) (ADF1
(ADFO))))) .
Figure 13.6 shows the trajectory of the robot for this L0L-point best-of-run
program with automatically defined functions. In contrast to the three Partial
trajectories shown in figures 13.2, T3.3, and 13.4, this best-of-run Program
takes advantage of the regularity of the problem envirorunent by mopping
down each column and then shifting to the left. This orderly action is interrupted from time to time by the obstacles; however, after making a slight
deviation to avoid the obstacle, the orderly mopping action immediately
resumes.
This 10L-point program may be exploiting the fact that no two obstacles
happen to be in the sarne column. If this were so, this behavior would be the
consequence of the very small number of fihress cases. Genetic programming
adapts only to the instances of the environment to which it is exposed. If the
Chapter L3
100
- s
cn
CN()
I
I
d
-
0
tss0
{ra .-
-
. l
-
f,e
L
A ,
-a
With Defined Functions
(7,5Vo) 25
Generation
Figure 13.7 Performance curves for the obstacle-avoiding-robot problem showing that
Ewith = 240,000 with ADFs'
Thble 13.3 Comparison table for the obstacle-avoiding-robot problem.
Without ADFs WithADFs
1.000.000 =l
v
t.q) \ a\ or0)
(50,95Vo) f,
ti
e.t
q)
-r
tI
500,000 cl +)
V)
-
A
bY
FI-
-
v
'-
.Fl
-
Y
A
I
Average strucfural
complexity S
Computional effort E
336.r
784,000
123.9
240,000
s
2N
WithoutADFs WithADFs Without ADFs With ADFs
Figure 13.8 Summary graphs for the obstacle-avoiding-robot problem.
set of fitness cases is sufficiently representative of some more general problem that the human user has in mind, genetic programming may evolve a
program that is also applicable to that more general problem.
The average structural complexity, S.rth,of the best-of-run programs of the
20 successful runs (out of 21 runs) of the problem of the obstacle-avoiding
robot with automatically defined functions is 123.9 points.
Figure I3.7 presents the performance curyes based on the 21 runs of the
obstacle-avoiding-robot problem with automatically defined functions. The
cumulative probability of success, P(M,i) ,
is90o/" by generation 29 and95%
375 Obstacle-Avoiding Robot
by generation 50. The two numbers in the oval indicate that if this problem is
run through to generation Zg,processing a total of E*ith =240,000individuals
(i.e.,4,000 x 30 generations x 2 runs) is sufficient to yield a satisfactory result
for this problem with 99% probability.
13.6 SUMMARY
Table 13.3 compares the average strucfural complexity, S*i,no41 dfld Swith,
and the computational effort, Ewithout arrd E*,,y, for the problem of the obstacleavoiding robot with automatically defined functions and without them.
Figure 13.8 summarizes the information in this comparison table and shows
a structural complexity ratio , Rs, of 2.7L and an efficiency ratio, R, , of 3.27 .
376 Chapter 13
14 The Minesweeper Problem
The minesweeper problem considered in this chapter is similar to the
lawnmower problem (chapter 8) and the problem of the obstacle-avoiding
robot (chapter 13); however, in this third problem of the progressiory the
obstacles are lethal.
I4.1 THE PROBLEM
In this problem, a minesweeper attempts to traverse a mine-infested area of
toroidal ocean. If the crew operating the scanning equipment does not constantly check for the presence of the mines before virtually every contemplated forward movement of the ship, the ship will quickly fall victim to a
mine. \A/hen the ship hits a mine, it is destroyed and loses the opportunity to
continue its voyage and accumulate additional credit.
L4.2 PREPARATORY STEPS WITHOUT ADFs
This problem is similar to the problem of the obstacle-avoiding robot, except
for the lethality of the mines. Thus, we adopt the terminal set and the function set from that problem (merely changing the name of the IF-OBSTACLE
conditional testing operation to IF-MIUE and the name of the MOp operator
to SWEEP).
Since there are side effecting functions in this problem, rF-MrNE must be
implemented as a macro as described in section 12.2.
Because the conditional branching operator IF-MrNE should be invoked
before eve{y contemplated move, this problem is considerably harder to solve
than the problem of the obstacle-avoiding robot. In a small preliminary set of
test runs without automatically defined functions, genetic programming did
not evolve any program that scored 112 (the threshold used in the success
predicate in the previous problem involving the obstacle-avoiding robot).
Genetic programming did, howeveq, find programs scoring the fuIl 116 with
automatically defined functions. Lr order to avoid expending excessive computer time on this problem in order to obtain multiple successful runs without automatically defined functions, we lowered the number of squares in
the definition of the success predicate (both with and without automatically
378
defined functions) to 109. This change increases the percentage of successftrl
runs and shortens the average length of the successful runs (in generations);
however, it prevents direct comparison of the results of this problem and the
results of the lawnmower and obstacle-avoiding robot.
The use of numerous fitness cases is desirable for this problem in order to
avoid memorization by the evolved programs of the particular arrangements
of mines in the environments that it sees. Howevet, because each run of this
problem is time-consuming, we compromised on the number of fitness cases
and allocated only enough computer time to this problem to support two
fiilress cases. The mines are located in the same places as the obstacles in the
problem of the obstacle-avoiding robot.
With the differences noted above, the tableaux from the obstacle-avoidingrobot problem (tables 13.L and I3.2) apply to this problem.
I43 RESULTS WITHOUT ADFS
Lr one run without automatically defined functions, the following 340-point
program scoring 109 (out of 116) emerged on generation 50:
(VBA (VBA (VBA (PROGN (VBA (FROG (FROG (VBA (FROG (PROGN (IFMINE (SWEEP) (SWEEP) ) (PROGN (PROGN (SWEEP) (SWEEP) ) (PROGN
(LEFr) (SWEEP) )))) (vBA (s,3) (FROG (5,2) ))))) (PROGN (v8A
(pRocN (vBA (FROG (SWEEP) ) (VBA (4,6) (SWEEP) )) (VBA (V8A (3,7)
(4,'7) ) (PROGN (FROG (VBA (FROG (3,5)) (FROG (LEFr) ))) (V8A (FROG
(PROGN (SWEEP) (SWEEP) )) (VBA 1IF-MINE (SWEEP) (SWEEP) ) (FROG
(0,2) )))))) (pRocN (LEFr) (SWEEP) )) (v8A (SWEEP) (FROG (VBA
(PROGN (PROGN (SWEEP) (SWEEP) ) (IF-MINE (SWEEP) (LEFT) )) (VBA
(pRoGN (SWEEP) (LEFr) ) (rF-MrNE (LEFr) (SWEEP) ))))))) (FROG
(2,L))) (FROG (VBA (IF-MINE (VBA (SWEEP) (SWEEP) ) (VBA (FROG
(swEEp) ) (PROGN (5,4) (SWEEP) ))) (VBA (FROG (V8A (PROGN (FROG
(vBA (PROGN (PROGN (SWEEP) (SWEEP) ) (V8A (2,6) (SWEEP) )) (V8A
(PROGN (SWEEP) (LEFT) ) (IF-MINE (LEFT) (SWEEP) ) ) ) ) (VBA (SWEEP)
(3,3) ) ) (FROG (SWEEP) ) ) ) (PROGN (VBA (V8A (SWEEP) (SWEEP) )
(PROGN (LEFT) (SWEEP) )) (VBA (FROG (SWEEP) ) (VBA (PROGN (PROGN
(5,4) (SWEEP) ) (VBA (FROG (FROG (6,2))) (PROGN (PROGN (I,EFT)
(swEEp) ) (PROGN (SWEEP) (SWnell IIII (rF-MrNE (0,7)
(SWEEP))))))))) (SWEEP) ) (PROGN (VBA (PROGN (PROGN (FROG (VBA
(FROG (3,5)) (FROG (LEFT) ))) (vBA (FROG (PROGN (VBA (IF-MINE
(SWEEP) (LEFT) ) (PROGN (SWEEP) (SWEEP) )) (IF-MINE (SWEEP)
(swEEP) ) ) ) (PROGN (VBA (SWEEP) (0,5) ) (rF-MrNE (SWEEP)
(swEEp) )))) (pRoGN (5,6) (3,0))) (FROG (FROG (3,5)))) (FROG
(pRoGN (VBA (IF-MINE (IF-MINE (IF-MINE (1,6) (5,5) ) (PROGN (IFMINE (SWEEP) (SWEEP)) (VBA (IF-MINE (SWEEP) (FROG (VBA (VBA
(0,6) (3,t1) (FROG (SWEEP) ) ) ) ) (vBA (vBA (SWEEP) (swEEP) ) (PROGN
(LEFT) (SWEEP) ))))) (VBA (FROG (5,2) ) (PROGN (5,4) (SWEEP) )))
(VBA (FROG (VBA (PROGN (PROGN (SWEEP) (SWEEP)) (VBA (2,5)
(SWEEP) ) ) (VBA (PROGN (SWEEP) (LEFT) ) (IF-MINE (LEFT)
(swEEP) )))) (PROGN (V8A (FROG (SWEEP) ) (FROG (V8A (PROGN (PROGN
Chapter 14
(5,4) (SWEEP) ) (IF-IflNE (S{IIEEP) (LEFT))) (VBA (PROG'I (S}IEEP) (LEI{|))
(rF*MrNE (LEH|) (SWEEP) ))))) (vBA (FROG (vBA (PROQ'I
(PROGN (SWEEP) (SWEEP) ) (V8A (2,6) (SWEEP) )) (VBA (PROGN (SWEEP)
(LEFT) ) (IF-MINE (LEFT) (SWEEP) )))) (FROG (3,5)))))) (V8A (V8A
(PROGN (SWEEP) (LEFT) ) (PNOOI (SWEEP) (LEFT) )) (PROGN (FROG
(swEEp) ) (rF-MrNE (V8A (V8A (FROG (FROG (3,5))) (PROGN (PROGN
(LEFT) (SWEEP) ) (PROGN (SWEEP) (SWEEP) ))) (PROGN (SWEEP)
(LEFT) )) (V8A (PROGN (PROGN (FROG (VBA (FROG (3,5)) (FROG
(LEFT) ))) (VBA (IF-MINE (LEFT) (SWEEP) ) (V8A (IF-MINE (SWEEP)
(swEEp)) (FROG (0,2))))) (PROGN (PROGN (PROGN (SWEEP) (LEFT))
(FROG (SWEEP) )) (V8A (PROGN (SWEEP) (LEFT) ) (PROGN (SWEEP)
(LEFT) ) ) ) ) (VBA (PROGN (SWEEP) (LEFT) ) (PROGN (SWEEP)
(LEFr) )))))))))).
Figure 14.1 shows a partial hajectory of this best-of-run 340-point individual
for operations 0 through 30; figure 14.2 shows the continuation of the trajectory for operations 30 through 60 of the first fitress case; figure 14.3 shows the
remainder of the trajectory for operations 60 through 84.
As can be seen from these three figures, the whole 8 -operation trajectory
traced outby this 340-point program operates in a seemingly arbitrary fashion even though the problem environment contains considerable regularity.
The average structural complexity, S.ithout ,of the best-of-run programs from
the 11 successful runs (out of 22 runs) of the minesweeper problem without
automatically defined functions is 342.4 points.
Figure 14.4 presents the performance curves based on the 22 runs of the
minesweeper problem without automatically defined functions. The cutnulative probabiliV of success, P(M,i),tsl}ohby generation 50. The two nurnbers in the oval indicate that if this problem is run through to generation 50,
processing a total of Eruoou, = 1,428,000 individuals (i.e.,4,000 x 51 generaFigure 14.1 Partial trajectory of the minesweeper executing the 340-point program for operations 0 through 30 without ADFs.
The Minesweeper Problem
-r--
I
I
I
I
-L--
I
I
I
I
I
I
I
I
I
---J-----
I
I
I
I
I
I
I
I
-----r---
I
I
I
Figure 14.2 Partial trajectory of the minesweeper for operations 30 through 60 without ADFs.
Figure 14.3 Partial trajectory of the minesweeper for operations 60 through 84 without ADFs.
tions x 7 runs) is sufficient to yield a satisfactory result for this problem with
99% probability.
r4.4 PREPARATORY STEPS WITH ADFs
La applying genetic progranrming with automatically defined ftrnctions to
this problem, we used the same arrangement of alps used for the lawnmower
and the obstacle-avoiding robot. Specifically, we decided that each individual
in the population would consist of one result-producingbranch and two function-defining branches. ADFO takes no arguments and ADFI- takes one argument. The second defined function ADFl can hierarchically refer to ADF0.
380 Chapter 14
Without Defined Functions
rl
q)
rn
Ch
q)
I
li
A .
-
q)
-.
+.
(n
-
cg'-
-
.-
.-
-
I
?1T
-
l- p,Mil I
l+ I(M, i, z) |
I M = 4pool
I z=99vo I
lR(z)=Z I
I N=22 |
^ l
L\-
0
(n
q)
I
I
I
-
a
ca.r
>)
*a
.Fl
-
a :
-
6l
Fl -
fr
A
l-l
20,000,000
10.000.000
iT\
i'\ ; \
| (50,50Va) (36,4.5Vo)
v
Generation
Figure 14.4 Performance curves for the minesweeper problem showing that
E without = 1,428,000 without ADFs'
14.5 RESUTTS WITH ADFs
In one mn of this problem with automatically defined functions, the following lM-point program emerged on generation 36 with a perfect raw fitness
of 11.6:
/nrnan /Aafrrn
\yrvYrr \se!srf rADF0 o
(VAIuCS (TF_MINE (V8A (IF-MINE (PROGN (LEFT) (SWEEP) )
(IF-MTNE (LEFT) (SWEEP) )) (TF_MTNE (PROGN (LEFT)
(5,7) ) (rF-MrNE (VBA (LEFr) (3,0)) (VBA (SWEEP)
(1-,2) ) ) ) ) (PROGN (vBA (V8A (SWEEP) (LEFr) ) (rF-MrNE
(LEFr) (SWEEP) ) ) (rF-MrNE (v8A (LEFr) (3,0) ) (v8A
(swEEP) (r,2) ))))))
(defun ADFI- (ARG0 ) .
(values (FRoc (VBA (VBA (PROGN (LEFT) (ADFO)) (VgA ARGO
ARGO)) (IF-MINE (PROGN ARGO ARGO) (VBA (ADFO)
(ADFO)))))))
(values (VBA (PROGN (PROGN (PROGN (VBA (VBA (VBA (ADFO)
(ADFO)) (ADF1 (SWEEP) )) (ADF1 (ADFI_ (LEFT) ))) (PROGN
(ADF1 (1 ,7) ) (PROGN (PROGN (ADFI (ADF1 (LEFr) )) (PROGN
(IF-MINE (VBA (ADFO) (ADFO)) (VBA (FROG (SWEEP)) (VBA
(ADFO) (ADF0)))) (FROG (ADFI_ (ADF1 (ADFO)))))) (PROGN
(pRocN (LEFr) ('/,5) ) (ADFr (LEFT) ))))) (VBA (ADFO)
(SWEEP) )) (ADF]. (ADF1 (ADFO)))) (VBA (ADFI- (ADFO))
(SWEEP) )))).
The success predicate for this problem treats a score of L09 as a success for
Pu{Pose of making the perforrnance curves, but runs with automatically
defined functions were permitted to run on in order to achieve a perfect score
of tI6.
25 50
381 The Minesweeper Problem
In run 2 with automatically defined functions , an84-point program with a
perfect score of 116 emerged on generation 36:
(progn (defun ADFO o
(va]-ues (IF-MINE (V8A (rF_MTNE
(PROGN (LEFT) (SWEEP) )
(IF_MINE (LEFT) (SWEEP) ))
( IF_MINE
(PROGN (LEFT) (5,7) )
( IF_MINE
(PROGN (LEFT) (3, O) )
(PROGN (SWEEP) (1,2) ) ) ) )
(PROGN (SWEEP) (LEFT)
(IF-MINE (LEFT) (SWEEP) )
(rF-MrNE (PROGN (LEFT) (3,0) )
(PROGN (SWEEP) (1,2) ))))))
(defun ADFI (ARGO )
(values (FRoc (VBA (VBA (pRocN (LEFT) (ADF0))
(V8A ARGO ARGO) )
(IF_MINE ARGO
(vBA (app91 (ADFO) ) ) ) ) ) )
(values (PROGN (ADF0) (aoFO) (ADF1 (SWEEP) )
(ADFI- (ADFI_ (LEFT) ))
(ADF1 (7,7) ) (ADF1 (ADFI_ (LEFr) ))
(rF-MrNE (PROGN (ADFO) (ADFO))
(PROGN (FROG (SWEEP) ) (ADFO)
(ADFO ) ) )
(FROG (ADF1 (ADF1 (ADFO))))
(LEFT) (ADFI (LEFT) )
(ADFO) (SWEEP) (ADFI (ADF1 (ADFO)) )
(ADFI (ADFg) ) (SWEEP) ) )) .
The behavior of ADF0 in this program from twt 2 can be analyzedby considering five cases.
Figure 14.5 shows case 1 wherein no mine is detected ahead of the minesweepel, the minesweeper moves north, tums left, and heads west for fwo
squares since no mines are ahead of the minesweeper at that point.
Figure 14.6 shows case 2 in which a mine is detected ahead of the minesweeper, the minesweeper immediately tums left to avoid it, and then finds
no other mines and keeps moving.
Figure 14.7 shows case 3 wherein no mine is detected ahead, the minesweeper moves north, tums left, finds a mine, and tums left again to avoid
the mine (thus heading south).
Figure 14.8 shows case 4 wherein no mine is detected ahead, the minesweeper moves north, and furns left. Seeing no mine, it moves forward (west),
finds a mine, and tums left (thus facing south).
Figure 14.9 shows case 5 wherein a mine is detected ahead of the minesweepe{, the minesweeper fums left to avoid it, detects another mine, and
turns left again (thus facing south).
382 Chapter L4
Figure 14.5 Case 1 of eor'O from run 2 for the minesweeper problem.
Figure 14.6 Case 2 of eor'O from run 2 for the minesweeper problem.
Figure 14.7 Case 3 ADF0 from run 2 for of the minesweeper problem.
Figure 14.8 Case4 of aor'0 fromrun2 for the minesweeperproblem.
Figure 14.9 Case 5 of aor'0 from run 2 for the minesweeper problem.
The Minesweeper Problem
i14
x l
I
I
--+--
I
I
384
Figure 14.10 Partial trajectory of 84-point program for run 2 of the minesweeper problem for
operations 0 through 30 withADFs.
Figure 14.10 shows the trajectory of the minesweeper for this 8&point program with automatically defined functions for operations 0 through 30 for
run 2 of the minesweeper problem; figure 14.11 shows the continuation of
this trajectory for operations 30 through 60; figure 14.12 shows the final part
of this kajectory for operations 60 through 98.
Here, in contrast to the lawnmower problem, the regularity being exploited
by the automatically defined functions is not immediately obvious from
inspection of the trajectory. No obvious qualitative difference is evident
between the trajectory with automatically defined functions (figures 1,4.10,
L4.11., and L4.\2) ar,d the trajectory without them (figures 1.4.I,"1.4.2, t4.3).
Nonetheless, the beneficial effect of automatically defined functions becomes
apparent when one sees the statistics, over a series of runs, of the average
structural complexrty und the computational effort.
The average strucfural complexity, S*itn,of.the49 successful runs (out of 50
runs) of the minesweeper problem with automatically defined functions is
I19.9 points.
Figure 1,4.13 presents the performance curves based on the 50 runs of the
minesweeper problem with automatically defined functions. The cumulative probability of success , P(M,i),is94"/"by generation 25 and 98% by generation 50. The two numbers in the oval indicate that if this problem is run
through to generation 50, processing a total of E*u, = 208,000 individuals
(i.e.,4,000 x 26 generations x 2 runs) is sufficient to yield a satisfactory result
for this problem with 99'hprobability.
As previously mentioned, the use of only two fitness cases for this problem and the obstacle-avoiding robot problem was a compromise made to
save computer time. In making this compromise we placed greater weight
on demonstrating certain points about automatically defined functions
Chapter 14
Figure 14.11, Partial trajectory of the minesweeper for run 2 of operations 30 through 60
withADFs.
Figure 14.12 Partial trajectory of the minesweeper for run 2 of operations 50 through 98
withADFs.
The Minesweeper Problem
,^.100
rn(t)
(l)
I
9
-
-
a
tsso
>-)
s
-
.-
-
-
Fl -
L
A .
-
0
With Defined Functions
987o)
1,250,000
25
Generation
Figure 14.13 Performance curves for the minesweeper problem showing that Ewith = 208,000
withADFs.
Thble 14.L Comparison table for the minesweeper problem.
2,500,000
\
(s0,
-
q)
CN
ct) q)
I
l.r A .
-
q)
A
-.
I
(a
G!l
€.!l
.-
-
Fl
I
FI
t- p,M$ I
l+ I(M' i' z)l
lM=4pool
I z=99%o I
I R1z;=2 |
I N=50 |
(l0,TVo)
Without ADFs WithADFs
Average structural U2.4
complexity S
Computional effort E 'J.,428,000
119.9
208,000
386
than in finding robust and complete solutions to the problems. The price for
this compromise was that the evolved programs for both problems are
overfitted to the minuscule number of fitress cases. For example, when one
of the best-of-run results from the obstacle-avoiding robot problem was
retested on 1,000 fitness cases (instead of just two), it scored only 45,278hits
(75%) out of a possible 58,000 with automatically defined functions and25,625
without them. \rVhen one of the best-of-run results from the minesweeper
problem was retested, it scored only 32,945 hits with automatically defined
functions and a mere 8,372without them.
The two problems, of course, differ as to the importance of looking before
moving. lAtrhen one of the best-of-run results from the obstacle-avoiding robot
problem was retested on 1,000 fibress cases, 73% ofits moves (measured by a
counter inserted into the programs) were unprotected blind moves with
automatically defined functions as compared to 92% without them. When
one of the best-of-run results from the minesweeper problem was retested,
L0% of its moves were unprotected blind moves with automatically defined
functions as compared to 87% wlthout them. Thus, the successful Programs
without automatically defined functions were memorizing the environment
more than the programs with them. The fact that so few (L0%) of the moves
Chapter 14
S
zffi
WithoutADFs WithADFs Without ADFs
Figure 14.14 Summary graphs for the minesweeper problem.
Table14.2
ratio, Rn,
problems.
Summary table of the structural complexity ratio, R5, and the efficiency
for the lawnmower, obstacle-avoiding-robot, and minesweeper
With ADFs
Problem Structural
complexity ratio R5
Efficiency ratio RB
Lawnmower - lawn sue 64
Obstacle-avoiding robot
Minesweeper
3.65
2.71
2.86
9.09
3.27
6.87
with automatically defined functions are unprotected probably indicates that
thebehaviors in the aDps are reused in different situations and therefore must
be more general.
14.6 SUMMARY
Table 14.1 compares the average sbructural complexitf, Swithout arrd Swith,
and the computational effort, Ewithout and Ewith, for the minesweeper problem with automatically defined functions and without them.
Figure 74.1.4surunarizes the information in this comparison table and shows
a structural complexity ratio , Rs , of 2.86 and an efficiency ratio, R" , of 6.87 .
We are unable to identify, either by analysis of the evolved programs or by
visualization of the trajectories of the minesweeper, the exact mechanism by
which the successful programs with automatically defined functions lower
the computational effort and their average size. Nonetheless, the strucfural
complexity ratto, R5, of 2.86 arrd the efficiency ratto, Rr, of 6.87 is evidence
that the automatically defined functionshave discovered and exploited some
regularity in this problem environment.
Both the problem of the obstacle-avoiding robot and the minesweeper problem demonstrate the benefits of automatically defined functions in an environment that is more complicated than the lawnmower problem.
Thble 14.2 summarizes the structural complexity ratio, R5, and the efficiency ratio, Ru, for the lawnmower, obstacle-avoiding-robot, and minesweeper problems.
387 The Minesweeper Problem
1_5Automatic Discovery of Detectors for Letter
Recognition
This chapter (and chapters 16 through 20) present problems which, when
solved using automaticallydefined functions, illustrate the simultaneous discovery of initially-unknown detectors and a way of combining the just-discovered detectors. The detectors that are dynamically discovered during the
run of genetic programming are then repeatedly used in solving the problem.
The goal of dynamically discovering feature detectors, rather than
prespecifyingthem, hasbeen a theme inthe field of automated pattemrecognition from its earliest days (tIhr and Vossler t966).Indeed, for many problems, finding the detectors (i.e., identifying the regularities and pattems of
the problem environment), doing the recoding (i,e., changing the representation), and findin g a way of solving the recoded problem really is the problem.
In fact, the broad goal of dynamically discovering detectors has been a
conunon thread running through the field of machine leaming since its earliest days. Arthur Samuel's L959 pioneering work involving leaming to play
the game of checkers raised this issue. The pattem being recognized in Samuel's
system was not a pattem of pixels in an affay,but rather au:r arrelngement of
checker pieces on a playing board. The problem in Samuel's checker player
was not to classify patterns, but rather to play checkers. hr spite of these differences, Samuel recognized the importance of getting leaming to occur without predetermining the size and shape of the solution and of "[getting] the
program to generate its own parameters [detectors] for the evaluation polynomial" (Samuel t959).
Lr Samuel's system, machine leaming consisted of progressively adjusting numerical coefficients in an algebraic expression of a predetermined functional form (specifically, a polynomial of a specified order). Each component
term of the polynomial represented a handcrafted detector (parameter)
reflecting some aspect of the current state of theboard (e.g., number of pieces,
center control, etc.). The polynomial calculated the value of a board to the
player by weighting each handcrafted detector with a numerical coefficient.
Thus, thepolynomial couldbe used to compare theboards thatwould arise if
the player were to make various altemative moves. The best move could then
be selected from among the altematives on the basis of the polynomial. If a
particular pollmomial was good at assigning values to boards, good moves
390
would result. In Samuel's system, the numerical coefficients of the polyromial were adjusted with experience, so that the predictive quality of the polynomial progressively improved. In addition to hand-crafting the detectors,
Samuel predetermined the way the detectors would be combined to solve
the problem by selecting the particular functional form of the polynomial.
Samuel's 1959 checker player can be viewed in terms of the bottom-up formulation of the hierarchical problem-solving process.
15.1 THE PROBTEM
Figure L5.L shows the letters r and L, each presented tna6-by-4pixel grid of
binary (ON or OFF) values.
The goal in +.his letter-recognilign problem is to discover a computer program that can take any of the 224 possible pattems of bits as its inputind
produce a correct identification r, L, or NrL (i.e., not the letter r or L) for the
pattem as its output.
Note that the correct identification of a pattern of pixels requires not only
establishing that all the specific pixels that must be ON are indeed ON, but
also inspecting other pixels on the grid to exclude the possibility of an imperfect letter or another letter.
ts.z PREPARATORY STEPS WITHOUT ADFs
There are, ofcourse, many different ways to structure a computer program to
perform the task of letter recognition. The programs that are to be evolved in
this chapter consist of hierarchical combinations of local detectors.
If one were trymg to describe the letter L to someone unfamiliar with the
Roman alphabet, one might give a dyramic description involving progressively drawing a vertical line of, say, five pixels downward from some specified starting location and then progressively drawing a horizontal line of , say,
two pixels to the right. This dynamic description of the pattem contains both
local and hierarchical aspects. The progressive pixel-by-pixel drawing of the
vertical and horizontal segments is a local activity; the assembly of the two
segments into the whole letter L occurs at a higher level of the hierarchy.
The local aspects of this dynamic approach to constructing a letter can be
implemented using a slow-moving turtle with limited vision. The turtle's
vision is limited to its immediate neighborhood of the nine pixels centered at
its current location. The pixel where the turtle is currently located is called
"x" (center) and the eight neighboring pixels are called N, NE, E, SE, S, SW w,
andNw.
The hierarchical aspects of constructing a letter can be implemented by a
mechanism for moving the furtle. The furtle starts at a designated location on
the grid and can move one step at a time to the north (rrp), south (down), east
(right), west (left), northeast, southeast, southwest, and northeast. The sequence
of movements of the furtle can be varied according to what the turtle sees at
its current local position.
Chapter 15
Figure 15.1 The letters r and r,.
If there were only two categories to be recognized (say just the letter I and
the negative category NrL), a Boolean expression might be convenient both
for implementing the computation required to do the required classification
and for controlling the sensing and moving activities of the turtle. Howeve{,
when there are more than two possible outcomes, a decision tree is more
suitable for a multi-way classification of patterns (Quinlan 1986).
The terminal set, { without automatically defined functions is
t= {r, L, NrL, x, N, NE, E, sE, s, sw, w, NW, (co-N) , (Go-NE) , (Go-E) ,
(co-sE), (co-s), (co-sw) , (co-w), (co-NE) ).
The first three terminals rnt (r.e., r,L, andNrl,) are the three categories
into which a given pattern may be classified.
The nextnine terminals int (i.e., X, N, NE, E, SE, S, SW, W, and mW) are the
turtle's sensors of its nine-pixel local neighborhood.
The last eight terminals int (i.e., (co-trt), (co-NE), (Go-E), (co-sE),
(co-s), (co-sw), (co-w), (co-Nn)) are zero-argument side-effecting
operators that can move the turtle one step in any one of the eight possible
directions from its current location. For example, the side-effecting operator
(co-N) moves the turtle north (up) one step in the 6-by-4 grid. For simplicity, the grid is toroidal. As the turtle moves, the values of the nine sensors (X,
N, NE, E, SE, S, SW, W, and Nw) are dynamically redefined to reflect the furtle's
new location. Each operator retums the value (r or NrL) of the pixel to which
the turtle moves (i.e., it refums the new x).
The function set, f,without automatically defined functions is
frpb= { rF, AND, oR, Nor, HOMTNG}
with an argument map of
{3,2,2, l, ll .
Since the overall program is to be a decision tree, the function set includes
the three-argument decision-making if-then-else operator. The conditional rr'
operator first evaluates its first argument. The rF operator executes its second (then) argument if (and only if) its first argument evaluates to something
other than NIL; the rF operator executes its third (else) argument if (and only
if) its first argument evaluates to NrL. This IF operator is implemented as a
391 Automatic Discovery of Detectors for Letter Recognition
macro in the silne manner as the IF-FOOD-AHEAD operator for the artificial
ant problem (section t2.2). The fact that this rF operator always evaluates its
first argument and then only evaluates exactly one of its two remaining arguments is significant when these arguments themselves contain side-effecting
operations.
The AND, oR, and NoT are included in the function set to enable the program to create logical predicates.
The values retumed by the Common LISP functions ano and OR are the
usual Boolean values; howeveq, these functions have a behavior thatbecomes
significant when their arguments contain side-effecting operations. Specifically, if the first argument to a two-argument AND evaluates to NrL, the second argument of the AND is not evaluated at all and the AND returns NIL.
Similarly, if the first argument to a two-argument oR evaluates to something
other than NTL, the second argument of the oR is similarly short-circuited
and the oR retums that non-NIL value. Consequently, any side-effecting
operator contained in a short<ircuited second argument is never executed.
The one-argument HOMTNG operator first remembers the current location
of the turtle and then evaluates its argument. HoMTNG has the additional
effect of rubber-banding the turtle back to its previously remembered position after completion of the evaluation of its argument. For example, suppose
the turtle starts at a certain position on the grid having an ON to its east and
an ON to its northeast.Theru
(HOMTNG (ANo (GO-E) (GO-N) ))
would first move the turtle east; because the turtle sees an ON in that
location, the second argument of the aun would then be evaluated, moving the turtle to the north. Because the turtle would also see an ON in the
new location (to the northeast of its initial position), the call to (co-N)
would return T, as would the awn. The value returned by the HOMTNG is
the value returned by its one argument, so the HOMING returns T. The
HOMTNG also returns the turtle to the remembered position that it was in
before beginning the HOMING. HOMING is equivalent to the brackets in a
Lindenmayer system (Prusinkiewicz and Lindenmayer 1990).
Because of the complexity of the programs evolved by genetic programming for this problem, our ability to au,.ralyze and understand the evolved
programs for this problem canbe greatly enhanced by imposing a constrained
syntactic structure that separates the different kinds of activity within the
decision tree. Specifically, the first (antecedent) argument of each rF operator
is constrained to be a composition of the three Boolean functions (axo, oR,
NOT), the HOMTNG function, and the eight turtle-moving operators, namely
(co-m), (co-NE), (co-E), (co-sE), (co-s), (co-SW), (co-w), and
(co-NE ) . In addition, the second (then) and third (else) arguments of eadr tF
operator are constrained to be compositions of the rF operator and the category-specifying terminals (r, L, and lul). The initial random population is
randomly generated in conformity with these constraints; structure-preserving crossover is used to preserve this constrained syntactic structure.
392 Chapter 15
The fiUress cases for genetic prograilrming must be chosen to represent a
sufficient variety of situations that the program is likely to generalize to handle
all possible combinations of inputs. hr this regard, the fitress cases are similar
to the small set of combinations of inputs that are used to test and debug
computer programs written by human programmers.
Each individual in the population is tested against an environment consisting of 78 fitness cases, each consisting of a6Sy4pixel pattern and the correct
identification (t, L, or NIL) for that pattem. The set of fitness cases is constructed to include the two positive fitness cases (the two letters) and76 drtferent negative fibress cases. The negative cases include every version of the
letters r and r, with one ON pixel deleted; every version of the letters r and I
with one extraneous ON pixel added adjacent to the correct pixels; checkerboard pattems; the all{N and allOFF pattem; various pattems bearing some
resemblance to r, L, or other letters; and various random pattems bearing no
resemblance to r, L, or other letters.
Figure 15.2 shows seven negative fihress cases in which one pixel of the
Ietter l-, is missing.
Figure 15.3 shows 14 negative fitress cases in which one pixel is added to
the letter L.
Figure 15.4 shows ten negative fibress cases that are somewhat like the
Ietter L.
Figure 15.5 shows five negative fibress cases in which one pixel of the letter
I is missing.
Figure 15.6 shows L2 of the L3 negative fitness cases in which the letter T is
augmented by one pixel. The 13th such fibress case happens to be identical to
I I
r
Figure 15.2 Letter l, with one pixel missing.
393 Automatic Discovery of Detectors for Letter Recognition
Figure 15.3 Letter L with one pixel added.
394 Chapter 15
Figure 15.4 Extra fitress cases resembling the letter 1,.
Figure 15.5 Letter r with one pixel missing.
Automatic Discovery of Detectors for Letter Recognition
396
Figure L5.5 Letter r with one pixel added.
one of the fibress cases where one pixel is deleted from the i, (i.e., the last
fitness case shown in figure 15.2).
Figure 15.7 shows ten negative fihress cases with little similarity to the letters I or L.
Figure 15.8 shows 18 additional negative fitness cases.
\Mhen a genetically evolved program in the population is tested against a
particular fitness case, the outcome can be
. a true-positfue (i.e., the program correctly identifies an I as an T or an L as
an L),
. atnte-negatiae(i.e., the program correct$ identifies a pattem that is not an I
- ---- \
Of L aS NIIJr/
. afalse-positiae (i.e., the program incorrectly identifies a non-letter as either
an f or an L),
. afalse-negatioe (i.e., the program incorrectly identifies an I as an NIL or an
L as NIL), or
Chapter 15
Figure 15.7 Ten extra fitness cases with little similarity to the letters r or L.
' awronS-positiae (i.e., the program incorrectly identifies em I as an L or an L
as r).
For this problem, fibress is the sum, over the fihress cases, of the weighted
errors produced by the program. The smaller the sum of the weighted
errors, the better. A L00%-correct pattern-recognizer would have a fitness
of 0. True-positives and true-negatives contribute 0 to the sum. False-positives and wrong-positives contribute 1. False-negatives contribute23 (i.e.,
the number of pixels minus 1). This choice of 23 maintains consistency
with work (done earlier) reported in section 15.8 involving translationinvariant letter recognition.
Our choice of 8,000 as the population size reflects our estimate as to the
likelydifficulty of thisproblemand thepracticallimitations on available computer time and memory.
Thble 15.1 summarizes the key features of the letter-recognition problem
involving the letters r and I without automatically defined functions.
ge7 Automatic Discovery of Detectors for Letter Recognition
ffffiffiffffiffiffi#tilffiffi
1ffiffiffffiffi
Figure 15.8 18 additional negative fitness cases.
398 Chapter 15
Thble 15.1 Tableau withoutADFs for the letter-recognition problem.
Objective: Find a program that identifies a given 6-by4ptxel
pattern as being artT,L, or neither (NIL) .
Terminal set
without ADFs:
r/ L/NrL,X,N,NE, E, SE, S, SW, W,NW, (GO-N), (GONE), (cO-E), (GO-SE), (CO-S), (GO-SW), (cO-W),
and (co-NE).
Function set
without ADFs:
rF, AND, OR, NOT, and UOUINC.
Fitness cases: 78 fitness cases, each consisting of a 6-by-4 pixel pattern
and the associated correct identification (I, L, or NIL)
for that pattern.
Raw fitness: The sum, over the 78 fitness cases, of the weighted
errors produced by program.
Standardized fihress: Same as raw fihress.
Hits: The number (unweighted) of fibress cases for which the
identificationproduced by the program is correct.
Wrapper: None.
Parameters: M=8,000.G=51,.
Success predicate: Aprogram scores the maximum number of hits.
1s.3 RESUTTS WITHOUT ADFs
This problem is very time-consuming. We made several runs without automatically defined functions. Each run exhibited progressively better fitress,
eventually coming close to a perfect score (i.e., a standardized fibress of 2);
howeveq, no run without automatically defined functions produced a solution
to this problem. Because of the progressive improvement within these runs,
we believe that this problem can be solved without automatically defined
functions if given a larger population or if permitted to run for more generations. There was, howeveq, no prospect of getting multiple successful runs of
this problem without automatically defined functions with any reasonable
amount of computer time.
ls.4 PREPARATORY STEPS WITH ADFs
In applying genetic progranuning with automatically defined functions to
the problem of letter recognition, we want the result-producing branch to
retum the identification of the pattem. The desired biases toward both local
inspection and overall hierarchical structure can be attained by specifying
that the result-producing branch is capable of sensing only the single pixel
where the furtle is currently located, and does not have direct access to any
other pixels. The result-producing branch can expand its view somewhat by
calling on detectors (ADFs) that are capable of sensing the nine-pixel local
399 Automatic Discovery of Detectors for Letter Recognition
400
neighborhood of the turtle. Lr addition, the result-producing branch can
globally expand its view by moving the turtle. The result-producing branch
will consist of compositions of the operations for moving the turtle and Boolean conjunctions, disjunctions, and negations operating on the values refumed
by the detectors.
The function-defining branches define detectors that examine the entire
nine-pixel local neighborhood of the turtle and evaluate compositions of Boolean functions involving what the turtle sees. That is, the function definitions
willbe compositions of the Booleanconjunctions, disjunctions, andnegations
and the nine pixel sensors (x, u, NE, E, sE, s, sw, w, and Nw).
In applying genetic programming with automatically defined functions to
this problem, we decided that eadr individual overall program in the population will consist of five function-defining branches (defining detectors called
ADF 0 through aDF4) and a final result-producing branch. Since the automatically defined functions are designed merely for detecting small local 3-by-3
pattems, they have no need to refer hierarchically to one another.
We first consider the five function-definingbranches (i.e., the detectors).
Since the five function-defining branches are to define detectors that are
capable of analyzing what the turtle sees at its current location on the grid,
the terminal set for ADFO, ADF1, ADF}, ADF3, and app4 consists of the nine
pixel sensors, so that
,Iad.f= {x, N, NE, E, SE, S, SW, w, NW}.
The function set, f64y,for the five function-definingbranches is
fadf= {aNn, oR, Nor}
with an argument map of
{2,2,ll.
Notice that there are no side-effecting functions in the function-defining
branches.Th"y are designed solely for detecting small local pattems, not for
moving the turtle.
Each of the five function-defining branches is a composition of functions
from the function set, fadf, andterminals from the terminalset,'Io47.
This is the first problem in this book in which there are multiple function-defining branches and yet the function-defining branches do not refer hierarchically to one another. Moreover, this is the first problem where
all the automatically defined functions have identical terminal sets, function sets, and argument maps. In implementing strucfure-preserving crossover in this situation, one might assign one common type to all five like
branches (called like-branchtyping) or one might assign five separate types
to the five branches (i.e., our default approach of branch typing). An experiment involving branch typing and like-branch typing is found in Genetic Programming (section 25.LL). We have chosen to continue to use our
usual branch-typing for this problem even though like-branch typing
would have been a reasonable choice.
Chapter 15
We now consider the result-producingbranch.
We envisage that the result-producing branch of each program will be a
decision tree consisting of compositions of decision-making functions that
return the identification r,L, or NrL.
Thus, the terminal set, 'lrpb, for the result-producing branch is
trpb= {t, L, NTL, (Go-N), (GO-NE), (Go-E), (Go-SE), (Go-S),
(co-sw), (co-w), (Go-NE) ).
The five automatically defined functions (annO through ADF4) that constitute the detectors are included in the function set of the result-producing
branch.
Thus, the function set, frpb, for the result-producing branch is
frpb= {ADF0, ADFI-, ADF2, ADF3, ADF4, rF, AND, OR, NOT, HOMTNG}
with an argument map of
{0, 0, 0, 0, o, 3,2,2,1, 1}.
The resultproducing branch is a composition of the functions from the
function set, f*6, and terminals from the terminalset, typb.
Our ability to analyze and understand the evolved programs for this problem can be greatly enhanced by imposi*g u constrained slmtactic structure.
Specifically, the first (antecedent) argument of each rF operator is constrained
to be a composition of the three Boolean functions (aNt, oR" Nor), the five
automatically defined functions (anro, ADFI, ADF2, ADF3, and alr'4), th"
HOMTNG function, and the eight turtle-moving operators, namely (cO-N),
(co-Nn ), (co-n ), (co- sn ), (co- s ), (co- sw ), (co-w ), and (co-Nn ) . Lr additioru the second (then) and third (else) arguments of each rF operator are
constrained to be either calls to the rr operator or the category-specifying
terminals (t, L, and Nrl).
Thble 15.2 summarizes the key features of the letter-recognition problem
involving the letters r and I with automatically defined functions.
L5.5 RESUf,TS WITH ADFs
A review of one particular successful run will illuskate how genetic programming simultaneously evolves the detectors and evolves a way of combining
the detectors for the problem of letter recognition.
The 8,000 randomly generated individuals found in the initial generation
of the population (generation 0) are, as one would expect, not very good. The
worst individualpattem-recognizer in the population for generation 0 has 44
points and has the highly unfavorable fitness (weighted error) of 3,535. This
worst-of-generation program is
(progn (defun ADFO o
(values (AND (NOT S) (NOT
(defun ADFI o
(values (AND (OR S W) (AND
sw) )))
NN))))
401, Automatic Discovery of Detectors for Letter Recognition
Table 15.2 Thbleau withADFs for the letter-recognition problem.
Obiective: Find a program that identifies a given 6-by-4pixel
pattern as being anI,L, or NIL (neither).
Architecture of the
overall program
with ADFs:
One result-producing branch and five zero-argument
function-defining branches. No hierarchical references
between function-defining branches.
Parameters: Branch typing among the five automatically defined
functions (detectors).
Terminal set for the
result-producing
branch:
r, L, NrL, (GO-N), (GO-NE), (CO-n), (cO-SE),
(GO-S ), (Go-SW), (GO-W), and (co-Ns) .
Function set for the
result-producing
branch:
IF, AND, OR, NOT, HOMTNG, ADFO, ADFI_, ADF?,
ADF3, and aop4.
Terminal set for the
function-defining
branches ADF0,
ADFI, ADF2,
ADF3. and aop4:
Function set for the
function-defining
branches ADFO,
ADFI-, ADF2,
ADF3. and aOr'4:
AND, OR, anduOt.
Types of points for
result-producing
branch:
The result-producing branch is to be a decision tree.
. The rr operator (which is always at the root).
. Point in first argument (condition part) of an rF.
. Point in second (then) or third (else) argument of
an IF.
Rules of construction
for result-producing
branch:
The rootnode mustbe an rP.
The first argument (condition part) of an rF may
contain any composition of the three Boolean
operators (RNo, oR, Nor), the five automatically
defined functions (aor0, ADF1, ADF2, ADF3, and
ADF4), HoMrNG, and the eight twtle-moving
operators (cO-N), (GO-NE), (Go-E), (GO-SE),
(GO-S), (GO-SW), (CO-W;, and (GO-NE).
The second (then) and third (else) argument of an rF
contain only other IFs or category-specifying
terminals (r,L, or NrL).
Chapter L5
(defun ADF2 o
(values (AND (OR W E)
(defun ADF3 o
(values (AND (OR N E)
(defun ADFA o
(oRXW) )))
(ANDXNE) )))
(values (OR (NoT N) (oR NE SW) ) ))
(values (IF (oR (ADF3) (Go-N) ) (IF (ADF4) I NIL)
(rF (GO-N) r L) ))) .
Howeve{, even in a randomly created population of programs, some individuals are better than others. For example, an individual at the 33rd percentile of generation 0 has fitness of 1,771..
The median program (i.e., the 50th percentile) from generation 0 has 18
points and a fibress of 152.
(progn (defun ADFO o
(values (NOT NE) ) )
(defun ADFI- o
(values (AND N X) ) )
(defun ADF2 o
(values (OR N NW) ) )
(defun ADF3 o
(values (AND W NE) ) )
(defun ADF4 o
(values (AND W W) ) )
(values (IF (cO-E) L I))) .
The best of generation 0 has 186 points and a fifiress of 53.
Figure 15.9 shows,by generation, the fihress of the best-of-generation
program. As can be seen, it tends to improve (i.e., drop) from generation to
generation.
Figure 15.10 shows the hits histograms for generations 15,30, and 50 of this
run. The horizontal axis of the hits histogram represents the number of hits (0
to 78); the vertical axis represents the number of individuals in the population (0 to 8,000) scoring that number of hits. Four solutions to the problem
emerge at generation 50.
By generation 50, the best-of-generation program has 312 points (of which
L49 are in the result-producing branch) and has the perfect fibress value of 0.
This best-of-run individual is shown below:
/nrnnn /Aa€rrn
\ts/!vYrr \vu!urr lADF0 o
(values (on (OR (AND W SE) (on (AND (NoT (oR SW SW) )
(llor (AND x sw) ) ) (Nor (oR (Nor s) (axn x NW) )) ) )
(aun (oR (Nor s) (AND w sE) ) (oR (oR s x)
(NorN))))))
(defun ADF1 o
(values (AND (AND (NOT (Nor x) ) (uOr (oR S X) ) )
(Nor (oRSX) ))))
(defun ADF2 o
(values (OR (AND (NOT (AND W E) ) (OR (AND NW W) (NoT
NW) ) ) (OR (OR (AND N E) (AND S SE) ) (ON (AND W (NOT
403 Automatic Discovery of Detectors for Letter Recognition
60
th
IA€)
iI'il
S30
L
a
25
Generation
Figure 15.9 Standardized fitness of the best-of-generation programs of the letter-recognition
problem with ADFs.
5000
4m0
3000
2000
tm0
0
5000
40m
30m
20m
1000
0
5000
4000
3000
2000
1000
0
0 3 6 9 t2 15 18 2t'2An n n % 9 4245 48 5l y 57 0A 6 A n15 n
Figure 15.1,0 Hits histograms for generations 15,30, and 50 of the letter-recognition problem
withADFs.
I
0 3 6 9 12 t5 t82t'2A n T T% 9 4245 4851 I 57 Q 63 6 0 n75 n
0 3 6 9 12 15 18 2r Un n T',X 9 4245 48 5r g ir fiA 6 @ 72 75 n
404 Chapter 15
NW) ) (OR SE NE) )))) )
(defun ADF3 o
(values (AND (NOT (AND (NoT SE) (on W SW) ) ) (oR (NOT
(oR NW (NOr NW) ) ) (AND (NOr S) (nNn (NOT (AND (NOr
SE) (OR W SW) ) ) (OR (NOT (OR NW (NOT (AND (NOT SE)
(OR W SW) ) ) ) ) (AND (NOT S) (OR (NOT (NOT NW) ) (NOT
sE))))))))))
(defun ADF4 o
(values (ANn (NoT (oR (oR W SW) (oR NW NW) ) ) (AND (AND
(AND x N) (NOr NE) ) (ANo (NOr (OR (OR E SW) (OR NW
NW) ) ) (AND (AND (AND (AND X N) (NOT NE) ) (NOT NE) )
(oR (oRNSE) (onxE) )))))))
(values (IF (on (NoT (ADF4)) (AND (oR (NoT (AND (Go-S)
(GO-S) ) ) (AND (OR (NOr (AND (GO-S) (GO-S) )) (AND (NOr
(AND (ADF3) (ADF3))) (HOMING (GO-S)))) (OR (NOT (ANO
(co-s) (Go-s))) (aNt (HoMrNG (GO-N) ) (HOMTNG (GON) ))))) (OR (NOr (ADF4)) (AND (OR (NOr (AND (GO-S)
(GO-s) ) ) (auo (oR (Nor (axo (Go-s) (Go-s) ) ) (exo (Nor
(AND (ADF3) (ADF3))) (HOMTNG (cO-N) ))) (OR (NOr (AND
(co-s) (Go-s))) (AND (HoMrNG (GO-N) ) (HOMTNG (GON) ) )) ) ) (OR (Nor (AND (Go-S) (Go-s) ) ) (AND (NOr (Ann
(co-s )
(ADF3 ) ) ) (HoMrNG (GO-N) ) ) ) ) ) ) ) ;;;antecedentofoutermostlF
(IF (HOMING (AND (GO-S) (ADFO))) (IF (GO-S) NIL L) (IF
(HOMTNG (GO-S)) (rF (ADF]-) L r) (rr (ADF1) L NrL) ));
; ; then-partof outermostlF
(rF (on (on (Go-E) (anr':)) (ANo (oR (Nor (ADF4)) (Awo
(NOr (ADF3)) (OR (NOr (Ano (cO-S) (GO-S))) (aNo (NOr
(Go-s)) (AND (ADF3) (ADF3)))))) (HoMrNG (co-N) )) ) (rF
(ADF2) (TT' (GO_S) L NIL) (IF (GO-S) (TT' (ADF3) L NTL)
(IF (GO_S) NIL L) )) (IF (NOT (ADFI)) (IF (GO_E) NIL I )
(IF (ADF1) L L) ));;;else-partofoutermostlF
))).
The performance of the 100%-correctbest-of-run individual from generation 50 from the run described above can be understood by first considering
how this program successfully recognizes the pixel pattem for the letter L. To
aid in this process, figure 15.11 identifies the seven pixels that must be ON for
the pattem to be the letter l, with the Roman numerals I through VII and
identifies the 14 adjacent pixels that must be OFF for the letter L with
the lower-case letters a through n. The turtle always starts at pixel III for
any pattern.
When the best-of-run individual from generation 50 encounters an L, it
moves the turtle 25 times. Figure 15.12 shows the 25 steps in this trajectory
with the turtle starting at pixel III and ending at pixel n. The figure omits
certain numbers where the turtle repetitively moves back and forth over the
same two adjacent pixels.
Sincethe result-producingbranchbeginswith (rn (on (Nor (ADF4 ) ) ...,
the defined function ADF4 is evaluated with the turtle at its starting location
405 Automatic Discovery of Detectors for Letter Recognition
a
, i . l
i !. 'i n
b m
c S' I
d iW:;
;:i:1i,i '!t:;:
k J
I 'vli
f
g
D h i
Figure 15.11 The seven ON and 14 OFF pixels constituting the letter r,.
Figure 15.12 Trajectory of the turtle for identifying an I for best-of-run program from
generation 50.
dre A
$i t;:;
:tlil
ii: i; 'rl, i:
ili(
1,.:
t
iJ.E
I +
Chapter 15
otr Ot otr
otr OL otr
otr
iltl
NW
flf,
N
iltl
NE
ill 0
ilil1
S W S.i* SE
!
Figure 15.13 Arrangement of pixels required to cause ADF4 to return T.
Figure 15.14 Detector ADF4 applied at pixel III at turtle step 0.
(pixel III) at turtle step 0. Detector ADF4 examines seven of the nine pixels
within view.
Figure 15.13 shows the seven pixel values required to cause the function
definition for eov4 to retum a value of r. ADF4 depends on seven pixels,
lacks any reference to pixel S, and effectively ignores pixel SE.
\zVhen the turtle is located at pixel III (as shorm in figure 15.14), pixels II
and III are ON, and pixelsb,c,d,m, and I are OFF, ADF4 retums t. These
latter five pixels all lie adjacent to the vertical segment of the l. As a result,
when the turtle is at pixel III, aor'4 acts as a detector for two of the tfuee
vertically stacked pixels of a potential I being ON and as a detector for five of
the six pixels adjacent to the potential L being OFF. ADF4 is an incomplete
detector for a vertical line segment.
Since ADF4 retums T when the turtle is located at pixel III, (xor (ADF4 ) )
is NrL, so the second clause of the first oR must be evaluated. This second
clausebeginswith (AND (on (Nor (ANn (co-s) (co-s) ) )....Whenthe
first argument of the inner axn, namely ( Go - S), is evaluated, the turtle moves
south (down the vertical segment of the L) from pixel III to pixel IV. Since
pixel rV is ON after this turtle step 1, the (co-s) operator retums T, thus
necessitating evaluation of the second argument of this inner AND. This second argument, which consists of another ( co - S ) operatol moves the furtle
south again from pixel IV to pixel V. Since pixel V is also ON after turtle step
2, the inner AND returns T. Howeve4 the NoT necessitates evaluation of the
second argument of the OR, namely
407 Automatic Discovery of Detectors for Letter Recognition
408
(AND (oR (Not 1A'ID (co-S) ... )) ... )).
The (Go-S) operator (the first argument to the inner AND above) now
moves the turtle south to pixel g. Pixel g is OFF at turtle step 3, since it is
below the vertical segment of the t . When we evaluate the two-argument
Boolean AND function, it skips evaluation of the second argument whenever the first argument evaluates to NrL. (Similarly, evaluation of the second argument of the oR function is skipped whenever the first argument
evaluates to r.) The second (co-S ) argument to the amo is replaced with
an ellipsis, since it will not be evaluated. This illustrates the fact that when
side-effecting operators are contained in arguments to Boolean functions,
the operators are conditionally executed depending on the context. Since
the UOT negates the NIL returned by the AND, the second argument to
the oR containing seven points is also skipped. We replace it with a second ellipsis.
Now, two (cO-S ) operators move the turtle to pixels I and tr (since the
grid is toroidal) at turtle steps 4 and 5. The (co-N) moves the turtle back to
pixel I at turtle step 6, but the HOMTNG rubber-bands the turtle back to pixel II
at turtle step7. This sequence is repeated at turtle steps 8 and 9,leaving the
turtle at pixel II.
Detector ADF4 is now applied at pixel II. Figure 15.15 shows that when the
turtle is located at pixel II, Aor4 retums T when pixels I and II are ON and
when pixels a,b, c, m, and n are OFF.
Detector ADF4 retums t and thereby provides the new information that
pixels a and n are OFF. As a result, by turtle step 9, seven pixels (a,b, c, d,l,m,
and n) have been verified as being OFF by two different applications of detector ADF4 and one additional pixel (g) has been verified as being OFF by the
(co-S ) operator. hn addition, five pixels (I,If Itr, fV, and V) have been verified (often repetitively) as being ON. Several pixels have been verified by
more than one action by turtle step 9.
Between turtle steps 10 and 2T, the turtle repetitively moves up and down
(because of several Hourxcs) along the vertical segment of the l, but provides no new information.
The turtle arrives at pixel [V at turtle step 21 and begins evaluation of the
last six points of the antecedent clause of the outermost rF of the result-producing branch, namely
(aNo (Nor (ANn (co-s) 1anr3)))
(HoMrNG (co-N) ) ) ) ) ) ) ) .
The (co-S) operator moves the turtle to pixel V (the junction of the L,
whidr is ON) and executes detector ADF3.
Figure 15.16 shows the pixel values required to cause ADF3 to retum a
value of t. As can be seery ADF3 examines five pixels (uw, w, SW, S, and SE) to
see if they are all OFF.
Figure 75.77 shows that when the turtle is located at pixel V (the corner of
the l), ADF3 returns t when pixels d, e, f , g, and h are OFF. These five pixels
Chapter 15
ilil1
NW $tlft{ iltl
NE
lil
w
9 tl
E
ilil
S W S SE
'i
otr
otr
otr otr otr
Figure 1,5.15 Detector ADF4 applied at pixel II at turtle step 9.
Figure L5.L6 Arrangement of pixels required to cause (ADF3 ) to retum a value of T.
all lie adjacent to the comer of the L, so ADF3 acts as a detector for emptiness
adjacent to the corner of an l.
Since the NOT negates the retumvalue of the AND, the (HOMING (cO-N) )
is not executed.
For the letter L, the antecedent part of the outermost rF of the result-producing branch evaluates to NrL, so that the second (then) argument of the rp
is skipped and the third (else) argument is executed. The (co-E) operator
moves the turtle east to pixel VI (which is ON) and causes detector ADF2 to be
evaluated for turtle step 23.
Figure 15.18 shows that ADF2 examines six pixels. ADF2 returns a value
of uri-., when NW, w, and p are ON and when N, NE, and sn are OFF.
Figure 15.19 shows that when the turtle is located at pixel VI at turtle step
23, eop2 retums a value of Nrl when pixels ry V and VII are ON and when
pixels i, j, and k are OFF.
The result now depends on the following expression involving detector
ADF2:
1 (rF (ADF2 )
2 (rF (GO-S) L NrL)
3 (rF (co-s)
4 (IF (ADF3) L NIL)
5 (rF (co-s) NrL L) ) ... ) .
409 Automatic Discovery of Detectors for Letter Recognition
i
,'. l
ilil1ffi
;N iiirXNE
tl
E
ililt
s w
ililt
SE
Ot otr otr
0, 01
otr
Figure 15.17 Detector ADF3 applied at pixel V at turtle step22.
Figure 15.18 Arrangement of pixels required to cause ADF2 to retum a value of url.
Figure 15.19 Detector ADF2 applied at pixel VI at turtle step 23.
Chapter 15
c'
@
Figure 15.20 Turtle steps 0, 9,22, and 23, where detectors ADF4, ADFA, ADF3, and ADF2,
respectively, are applied.
For the letter L, ADF2 evaluates to NIL on line 1. As a result, line 2 is skipped
and the (Go-S ) operator on line 3 moves the turtle south to pixel h at turtle
step 24. Since pixel h is OFfl line 4 is skipped and the (co- S ) operator on line
5 moves the turtle toroidally to pixel n for turtle step 25. Since pixel n is OFF
for the L, the result-producing branch retums t , which is indeed the correct
idenffication of the pattem.
If detector ADF2 were to retum I on line 1. above, this indicates either that
pixel i, j or k is ON or that pixel VII is OFF (since pixels [V and V were previously established as being ON).Any of these four possibilities would mean
that the pattem is a flawed pattem for which NTL (rather than L or r) should
be returned. For example, if pixel VII were OFF and pixel i were ON or if
pixel VII were ON and pixel i were ON, the pattem would be a flawed l. hr
these situations, the (co*S ) operator on line 2 would be executed, thereby
moving the turtle to pixel h. Since pixel h is already known to be OFF, the
result-producing branch would refum NrL, which under the circumstances
would be a correct idenffication of the pattem. Similarly, the result-producing branch refums the value NTL for the 14 fihress cases for which there is an
extraneous ON pixel adjacent to an I (in locations a through n) and the seven
fitness cases for which there is a missing pixel within an L (in locations I
through VII).
In summary the best-of-run individual from generation 50 applies detectors at turtle steps 0,9,22, and23 and considers direct input from the turtle
over 25 steps in order to determine that all seven pixels (I through tV) that
should be ON for an L are indeed ON and that all L4 pixels that should be
OFF for an L are indeed OFR as shornm in figure 15.20.
Le identifying the letter r, the turtle moves up and down the vertical colunrn consisting of pixels I through V and pixel g for the first 22 turtle steps
much as in the identification of the L. Howeve{, when the furtle moves east
on turtle step?3, pixel VI is OFF for the r.
The detector ADFO is used to determine that certain pattems should be
classified as NrL, although it is not used in classifying the positive cases.
411 Automatic Discovery of Detectors for Letter Recognition
Although ADFl is merely the constant function NrL, it appears, and is used,
in the result-producing branch.
Even when automatically defined functions are used, this problem is so
time-consuming that we have only made two successful mns. We do not contemplate making a performance curve for this problem.
1s.6 GENEALOGICAT AUDIT TRAILS WITH ADFs
Agenealogical audit trail shows the way that the crossover operation creates
offspring programs that are progressively more fit at classifying pattems.
For example, most of the genetic material for the best of generation L5 comes
from the 180th best program in the population from generation 14. Lr fact, the
best of generation 15 differs from this firstparent onlyby the six-point subtree
of ADF1, shownbelow inboldface. ParentAfrom generation L4 scores 47lits
and is shown below:
(progn (defun ADFO o
(values (oR (oR (AND (OR S x) (oR N SW) ) (OR (AND (NOT
(OR SW SW) ) (NOT (AND X SW) ) ) (OR (OR (AND X NW) (AND
SE NE) ) (AND (OR NE E) (ON X SW) )))) (AUO (ON (NOT S )
(NOr (OR S SW) ) ) (OR (AND w sE) (NOr N) ) ) )) )
(defun ADF1 o
(values (AND (AND (NoT (Nor x) ) (xor (oR S X) ) ) (oR w
(NOr (AITDNSw))))))
(defun ADF2 ( )
(values (On (ANo (NOT (AND W E) ) (On
NW) ) ) (OR (OR (AND N E) (AND S SE) )
NW) ) (ORSENE))))))
(defun ADF3 o
(values (AND (NoT (AND (NOT SE) (On
(oR NW (NOr NW) ) ) (AND (NOr S) (On
(NorsE) ))))))
(defun ADF4 o
(values (AND (NoT (oR (OR W SW) (OR NW NW) )) (AND (AND
(ANDXN) (NOTNE) ) (on IoRNSE) (oRXE) )))))
(values (rF (oR (NoT (ADF4)) (AND (Nor (ADF3)) (on (NoT
(AND (GO-S) (GO-S) )) (ANO (NOT (AND (ADF3) (ADF3)))
(HOMING (GO-N) ))))) (IF (HOMING (AND (GO-S) (ADFO)))
(rF (GO-S) NrL L) (rF (HOMTNG (GO-S) ) (rF (ADF1) L r)
(IF (GO-W) L NIL) )) (IF (OR (OR (GO-E) (ADF3)) (AND
(ADF3) (ADF3))) (TT' (HOMING (GO-N) ) (IF (ADF]-) L NIL)
(rF (ADF3) NrL L) ) (rF (NOT (ADFI)) (rF (GO-E) NrL r )
(rr' (ADF]_) L L) ))))) .
Parent B from generation L4 scores 48 hits and contributes only the fourpoint subtree from its Aln1, shown inboldface below.
(progn (defun ADFO o
(values (OR (oR (AND (oR S X) (OR N SW) ) (oR (On (AND X
sw) n) (AND s NW) ) ) (AND (OR (NOT S) (AND S SE) ) (AND
Chapter L5
(AND NW W) (NOT
(oR (AND W (NOT
W SW) )) (OR (NOT
(NOr (NOr NW) )
413
(ANDWSW) (ORNWW) )))))
(defun ADFI- ( )
(values (AND (AND (NoT (NoT X) ) (NOT (oR s x) ) ) (on
(AND (AND SW NW) (NOr SW) ) (NOT (AND N SW) ) ) )) )
(defun ADF2 o
(values (On (mlo (NOT (AND W E) ) (OR (AND NW W) (NoT
NW) ) ) (On (oR (AND N E) (AND S SE) ) (OR (AND W SE)
(oRSENE) )))))
(defun ADF3 o
(values (AND (Nor (AND (oR SW N) (oR W SW) ) ) (OR (NOT
(oR NW NE) ) (NOr (OR NW NE) ) ) ) ) )
(defun ADF4 o
(values (AND (uor (oR (oR W SW) (oR NW NW) ) ) (AND (AND
(AND x N) (NOr NE) ) (eno x N) ))))
(values (IF (oR (NoT (AND (Go-S) (Go-S))) (AND (NoT
(ADF3 ) ) (HOMTNG (cO-N) ) ) ) (rF (HOMTNG (AND (GO-S )
(ADFO)) ) (rF (ADF3) L NrL) (rr' (HOMTNG (GO-S) ) (rF
(ADF1) L r) (rr' (GO-W) L NrL) )) (rF (On (OR (GO-E)
(HOMTNG (AND (GO-S) (ADF0)))) (AND (ADF3) (ADF3)))
(rF (HOMTNG (ADF2)) (rr (ADF3) L NrL) (rr (GO-S) Url
L) ) (TP (NOT (ADFI)) (TT' (ADFO) NIL I) (IF (ADFI) L
L) ))))).
The best of generation 15 scores 56 hits and consists of parent A of generation L4 with the insertion of the highlighted four-point subtree from parent B
of generation 1,4. This offspring is shown below with the inserted crossover
fragement shown in boldface
(progn (defun ADF0 o
(values (OR (OR (AND (oR S X) (oR N SW) ) (on (AND (NoT
(oR sw sw) ) (Nor (AND x sw) ) ) (oR (oR (AND x NW) (AND
SE NE) ) (AND (OR NE E) (OR X SW) )))) (AIIO (ON (NOT S)
(NOT (OR S SW) )) (OR (AND W SE) (NOT N) ) ) )) )
(defun ADFI- o
(values (AND (AND (NOT (NoT X) ) (uOr (oR S X) )) (NOT
(oRsx)))))
(defun ADF2 o
(values (oR (AND (NoT (AND W E) ) (OR (AND
NW) ) ) (OR (OR (AND N E) (AND S SE) ) (On
NW) ) (OR SE NE)) ) )) )
(defun ADF3 o
(vAIUes (AND (NOT (AND (NOT SE) (OR W SW)
(oR NW (NOr NW) )) (ANo (NOr S) (On (NOr
(Nor sE))) ))) )
(defun ADF4 o
NW W) (NOT
(AND W (NOT
) ) (oR (Nor
(NOr NW) )
(values (exo (llOt (OR (OR W SW) (oR NW NW) ) ) (ANo (AND
(ANDXN) (NOTNE) ) (OR (ORNSE) (ORXE) )))))
(values (IF (oR (NoT (ADF )) (AND (NOT (ADF3)) (oR (NOT
(AND (GO_S) (GO_S))) (ANO (NOT (AND (ADF3) (ADF3)))
(HOMTNG (cO-N) )) ))) (rr' (HOMTNG (AND (cO-S) (ADFO)))
(IF (GO-S) NTL L) (IF (HOMTNG (GO-S) ) (rF (ADFI) L I)
Automatic Discovery of Detectors for Letter Recognition
414
(IF (GO-w) L NrL) )) (rF (OR (OR (CO-E) (ADF3)) (AND (ADF3)
(ADF3))) (rF (HOMTNG (cO-N) ) (rF (ADFI) L NrL) (rF (ADF3)
NrL L) ) (rF (NOr (ADFI) ) (rF (GO_E) NrL r)
(rr' (ADF1) L L) ))))) '
The result of the crossover is that ADF1 in the above offspring program
from generation 15 is always NrL. This has the effect of making it impossible
for the first two rFs that test ADF1 to retum a classification of r,. The third
occurrence of anpl has the effect of preventing access to the fourth occurrence of anpt-.
15.7 DETECTORS OF DIFFERENT SIZES AND SHAPES
Before solving the letter-recognition problem described above, we decided
that there would be five detectors, each capable of examining a3-by-3 pixel
subarea of the overall 6-by-4pixel grid.
Solving this problem does not depend on using square subareas. One might
have decided to include detectors capable of examining subareas of different
sizes and shapes. For example, one might have decided to have three 3-by-3
detectors, one 1-by-3 detectorcapable of examining ahorizontalrow of width
3, and one 3-by-L detector capable of examining a vertical column of height 3.
For example, if aor' 0 is the t-by -3 detector capable of examining a horizontal row of width 3, the terminal set for ADF 0 consists only of the three sensors
X, E, and w. Similarly, if eort is the 3-by-1 detector capable of examining a
vertical column of height 3, the terminal set for ADF1 consists only of the
three sensors x, w, and S.
The following 100%-correct ZM-point program emerged in generation 22
of a run:
/^-^^- t A^F..- ADF0 ( \I,/rvvrr \Lls!url )
(values (AND (AND X (AND (AND X (oR W E) ) (AND X (OR W
E) ))) (AND (oRWE) (NorE) ))))
(defun ADF1 o
(values (oRN (ORNN) )))
(defun ADF2 o
(values (NOT (On (NoT SE) Nn; 1 I I
(defun ADF3 o
(values (oR (AND X E) (OR (oR W NE) (uor (AND (oR E N)
(Norsw) ))))))
(defun ADF4 o
(values (oR (Nor (Nor SW) ) (NoT SW) ) ) )
(values (IF (on 1oR (on (ADF3) (On (Go-W) (HOMING (on
(HOMING (HOMING (ON (GO_W) (AND (OR (GO-S) (OR (GO-S)
(Go-w) )) (oR (Go-s) (aor':)))))) (aNn (Nor (AND (on
(GO-S) (GO-S)) (OR (GO-s) (Go-w) ))) (oR (GO-S) (on
(GO-S) (On (On (ADF3) (ADF3) ) (OR (Go-s) (GOs)))))))))) (Go-s)) (oR (Go-E) (Go-w) )) (rF (oR
(ADF3) (ON (GO_W) (GO-S) ) ) (IF (ADF3) NIL I) (IF (AND
(GO-E) (NOt (HOMTNG (On (HOMTNG (OR (HOMTNG (HOMTNG
Chapter 15
415
(ADF2))) (AND (NOT (A\TD (OR (GO-S) (GO-E) ) (OR (ADFO)
(ADF3)))) (oR (Go-s) (co-e) ;111 (oR (Go-E) (GoE) ))))) (rF (on (Go-s) (ADF3)) L r) (rr' (on (on (ADF3)
(oR (co-s) (Go-w) ) ) (oR (co-w) (ADF3 ) ) ) (rr (ANo (coe) (Nor (Go-s))) (rF (AND (ADFI) (aNo (NOr (AND (On
(GO-S) (GO-S)) (On (On (ADF3)
(ADF3) ) (oR (ADF3) (on (co*s) (Go-s) ) ) ) )) (oR (Go-s)
(oR (Go-s) (on (oR (ADF3) (ADF3)) (OR (GO-S) (GOs))))))) L r) (rF (oR (Go-w) (aNn (Nor (AND (OR (GO-E)
(GO-w) ) (on (Go-s) (Go-s)))) (on (Go*s) (aNo (on (Gon) (co-s)) (HoMrNG (ADF2)))))) url NrL) ) (rr (ano (GOs) (Nor (HoMrNG (oR (HOMTNG (HOMTNG (ADF2))) (GO-w) ))))
(rF (oR (oR (co-w) (Go-s)) (alro (NoT (am (oR (co*s) (cos) ) (oR (co-s) (c,o-s) )) ) (oR (co-s) (co-s) ) ) ) L
NIL) (IF (ADFO) L NIL) )))) (IF (ADFO) L NIL) ))) .
Lr this program ADFO examines only X, E, and W, and ADF1 examines only
N (although it is capable of also examining S and x).
15.8 TRANSTATION-INVARIANT LETTER RECOGNITION
hr the letter-recogrution problem described above, the letters were situated at
a particular location within the 6-by4 pixel grid. Genetic programming is
also capable of solving translation-invariant letter-recogmtion problems.
We introduced a wrapper that caused the execution of the overall program
from each of the 24 possible starting locations in the 6-by-agrid area. Equivalently, one canview the 6-by-4 grid area as a cellular space in which the same
overall program is embedded at each location (i.e., a cellular automnton).As
one would expect, this version of the problem is far more time-consuming
than the original highly time-consuming version. To reduce the amount of
computer time require d, we simplified the problem to one of identifying only
the letter r (and the negative category NrL). If *y of the 24 executions programs identified a pattem as the letter r, the pattern is classified as an
r. Otherwise, the pattem is classified as a NrL. Sixfy fitrress cases were used
(all but the 18 patterns of figure 15.8).
In one run, the following 1O0%-correct program capable of translation-invariant recognition of the letter r emerged in generation 32:
{nrnnn /daf"-
\vv!ufr ADF0 o
(values (oR (AND (Nor (Nor SW) ) (on (NoT E) (on S E) ) )
(NOT (AND (Arvn (NOr sE) (on s s) ) (Nor E) ) ) ) ) )
(defun ADF1 o
(values (NOT (oR E SW) ) ) )
(defun ADF2 o
(values (oR (oR (oR (AND S x) (NoT NE) ) (oR (NoT N) (OR
(on 1oR (oR (AND S x) (NOr NE) ) (NOr N) ) (NOr NE) )
(OR (OR (OR (OR X NW) (NOT NE) ) (ON (NOT N) (AND S
s) )) (Nor (Nor (AIID s s) )) ) )) ) (Nor (Nor (Nor 1a5p t
Automatic Discovery of Detectors for Letter Recognition
x) ))))))
(defun ADF3 o
(values (on (NoT (on 1}J91 *, (AND SE SW) ) ) (oR (AND
(AND SE SW) (NOT W) ) (AM E W) ) ) ) )
(defun ADF4 o
(values (aNn (mlo (Nor (oR x w) ) (on (Nor s) (on N x) ) )
(Nor (Nor (oR sE x) ) ) ) ) )
(values (IF (ANn (Nor (AND (ADF1) (ADF0))) (ANo q4rTp
(ADF1) (aUn (co-N) (Nor (ADFO ) ) ) ) (oR (ADF3 ) (ANo
(ADFI) (aNo (cO-N) (NOr (ADFO))))))) (rF (OR (cO_N)
(on (ADF0) (co-w) )) (rF (co-N) NrL NrL) (rF (OR (OR
(GO-N) (AND (ADFl) (ADF0))) (cO-W) ) (rr' (co-N) NrL
NIL) (IF (GO-W) NIL I))) (]F (ADFO) NrI NIL) ))).
The result-producing branch of this program retums either r or NrL.
15.9 SUMMARY
This chapter has described an approach for simultaneously discovering
detectors and a way of combining the detectors to solve a problem. The
approach was illustrated using a problem of letter recognition. The genetically evolved detectors were repeatedly invoked to produce a solution to the
overall problem.
See also Koza1993a.
Chapter 15
T6 Flushes and Four-of-a-Kinds in a
Pinochle Deck
The problem of recognizing a flush or four-of-a-kind in a five-card hand from
a pinochle deck is another example of a problem whose solution can be facilitated by the automatic discovery of reusable, initially-unlmown detectors.
L6."1, THE FTUSH PROBLEM
hr this problem, five cards are drawn (without replacement) from a? -card
pinochle deck. The denomination of a card canbe ace, king, queen, jack, ten, or
nine; t}re suit of a card can be club, diamond ,heart, or spade. A five-card hand
is aflush if all five cards are of the same suit. A hand contain s afour-of-a-kind if
four of the five cards share the same denomination.
We first consider the flush problem where the goal is to discover a computer program that determines whether a given hand dealt from a Z4-card
pinochle deck is a flush.
16.2 PREPARATORY STEPS WITHOUT ADFs
We envisage that the input to each program for the flush problem will be the
five cards of the hand and that eachprogramwillbe a decision tree consisting
of a composition of decision-making functions that retum the identification
FLUSH OT NIL.
The terminal set, 'T, for the flush problem consists of the five cards in the
hand and the constants FLUSH and Nrl for naming the category into which
the hand is classified.
7= {CARDO, CARDI_, CARD2, CARD3, CARD4, NrL, FLUSH}.
The function set, f, consists of
F= { SUrT, DENOM, rF, AND, oR, Nor, Ee}
with an argument map of
{1, L,3,2,2, L,2}.
SUIT returns the suit of a card; DENOM returns the denomination of a
card. Both SUrr and nnmoM return NrL if the argument is anything other
418
than a card. For example, if cARDO is the ace of hearts, (surr CARDO)
returns HEART; (DENoM CARDO) returns ACn; and (DENoM rr,usH)
refurns NIL.
In addition, rF, AND, oR, Nor, and ee (equal) are the usual Llspfunctions.
Each program in the population could reasonably be an unconstrained composition of functions from the function set, f, andterminals from the terminal
set, f;, however, experience indicates that the programs produced by genetic
Prograrnming composed of functions playing distinctly different roles canbe
very difficult to arta$ze, understand, and verify. Lr this problem, the role of
fwo of the functions (surt and osNoM) is to detect the characteristics of a
card. The EQ function tests for equality. The role of three of the functions (amn,
oR, and Nor) is to perform logical analysis. Finally, the role of the conditional
decision-making operator rF is to perform the classification of the hand into
the two categories. Experience also indicates that we can greatly enhance our
abilify to analyze, understand, and verify such genetically evolved programs
if we impose a constrained syntactic structure on the programs.
Specifically, we structure each program as a decision tree whose root node
must always be an rF. Programs typically contain many additional rrs. The
first (condition) argument of every r p must be a composition only of EB, AND,
oR, and Nor; the five cards (caRno, CARD1, CARD2, CARD3, and ceRoa); and
the detecting functions (surr and nnwoM). In particular, the first argument of
an rF does not contain another TF. The second (then) and third (else) argument of each rF contains only other rF operators or categories (rlusH or
NrL). The effect of these constraints is that the condition part of each rn is
ttoly a condition and the action part of each rF is truly a consequence (i.e.,
another rF or a category).
It is not clear whether the constrained syntactic structure hinders or improves the perforrnance of genetic programrning in discovering a suitable
program; however, it is clear that it substantially increases our ability to
understand the genetically evolved programs. The use of the constrained
syntactic structure does not, however, entirely eliminate the problem of opacity; the analysis of genetically evolved programs often still requires considerable effort.
We do not have sufficient computer time to measure the fitness of each
individual in the population of each generation of a run against all 42,504
possible five-card pinochle hands. Consequently, we construct a sampling of
fibress cases for the pu{pose of evolving a solution to the problem. For reasons
that will soon become clea1, we call this sampling the in-sample fitness cases for
the problem. The in-sample fitness cases for this problem consistof 1,000 random hands, of which half are random flushes and half are random hands that
are not flushes. T1o be useful, a randomized set of fitness cases such as this
mustbe sufficiently large that it is representative of the problem environment
as a whole.
\A/hen a genetically evolved program in the population is tested against a
particular fitness case, the outcome can be
Chapter 16
. a true-positive (i.e., the program correctly predicts that the given hand is a
flush when the hand is, in fact, a flush),
. a true-negative (i.e., the program correctly predicts that the given hand is
not a flush when the hand is, in fact, not a flush),
. a false-positive (i.e., the Progam "overpredicts" that the given hand is a
flush when the hand is, in fact, not a flush), or
. a false-negative (i.e., the Program "underpredicts" that the given hand is
not a flush when the hand is, in fact, a flush).
Fitness will measure how well a genetically evolved program predicts
whether a given five-card hand is a flush. Consider a first vector of N7r = 1,000
correct answers (with the integer 1 representing FLUSH and the integer 0 representing NIL) for the set of 1,000 in-sample fitness cases in a space of dimensionality N7r. Now consider a second vector of N1. of 1,000 predictions (1 or 0)
produced by u particular genetically evolved program for the set of 1,000
in-sample fihress cases. Suppose eachvector is transformed into a zero-mean
vectorby subtracting the average value of all its components from each component. Fitness can be measured by the correlation coefficient C. Specifically,
t}re in-sample correlation, C, is the cosine of the angle in this space of dimensionality N1, between the zero-mean vector of correct answers and the zeromean vector of predictions. A correlation C of -1".0 indicates that the pair of
vectors point in opposite directions in N7r-space (i.e., greatest negative correlation); a correlation of +1.0 indicates coincident vectors (i.e., greatest positive
correlation); a correlation C of 0.0 indicates orthogonality (i.e., no correlation).
Additional discussion of correlation and other measures of agreement between observed data and predicting programs is contained in subsection 18.5.2.
The in-sample correlation, C,lends itself immediately to being the measure
of raw fitness for a genetically evolved computer program. Since raw fibress
ranges between -1.0 and +1.0 (higher values being better), standardized fitness cern thenbe defined as
I_C
2
Standardized fitness ranges between 0.0 and +L.0, lower values being better
and a value of 0 being best. Specifically, a standardized fitness of 0 indicates
perfect agreementbetween the predicting program and the observed reality;
a standardizedfitness of +1.0 indicates perfect disagreemen! a standardized
fitness of 0.50 indicates that the predictor is no better than random.
Table l6.L summarizes the key features of the flush problem without automatically defined functions.
16.3 RESULTS WITHOUT ADFs
The following 54-pornt ptogram from generation 20 of one run of the flush
problem achieved 500 true positives and 500 true negatives, with no false
419 Flushes and Four-of-a-Kinds in a Pinochle Deck
420
Positives or false negatives. Consequently, it achieved a correlation of L.00
and a standardized fibress of 0.0.
(IN 1Bg (SUIT CARD2) (SUrt CARD4)) (IF (EQ (SUIT CARDI) (SUIT
CARD3)) (IF (EQ (SUIT CARDO) (SUTT CARD3)) (IF (EQ (SUIT CARD2)
(SUIT CARD3) ) (IF (EQ (SUIT CARDO) (SUIT CARD3) ) (IF CARD1 FLUSH
FLUSH) (TF CARD2 NTL NIL) ) (TT'CARD1 NIL FLUSH) ) (IF CARDO NIL
FLUSH) ) (IF CARD1 NIL FLUSH) ) (IF CARD1 NIL FLUSH) ).
This best-of-run program from generation 20 can be rewritten as
1_
1
J
=
5
o
7
B
9
(IF (EQ (SUIT CARD2) (SUIT CARD4))
(rF (EQ (SUrT CARDI) (SUrT Cann3) )
(IF (EQ (SUIT CARDO) (SUIT CARD3))
(rr 1sg (surr CARD2) (surr CARD3))
(IT' 1gq (SUIT CARDO) (SUTT CARD3) ) FLUSH NIL)
NIL)
NrL)
NrL)
NIL) .
When so simplified, it can be seen that the suit of caRo3 is compared to the
suits of caRo1, CARDO, and caRo2 on lines 2,3, artd4 and then recompared
to the suit of CARDO on line 5. The suit of CARD2 is compared to the suit of
CARD4 on the first line. If all of these comparisons indicate equality, the hand
is correctly classified as a flush. If *y comparison fails, the hand is classified
as notbeing a flush.
The true measure of performance for a predicting program is how well it
generalizes to different cases from its problem environment. We are able to
say that the abovebest-of-runprogram from generation 20 successfully generalizes perfect$ to the entire problem environment for this particular problem because we f,tlly understand the nature of the problem environment and
because we are able to analyze the acfual operation of the program in order to
verify that it does indeed solve the problem. In general, it is not possible to
verifya genetically evolved program inthis wayformore complicated classification problems.
As an altemative to verifyi.g a genetically evolved program by analytic
means, we can cross-aalidatehe performance of such a program by testing it
against unseen additional fitness cases (called the out-of-sample fihress cases).
For this problem, there arc 42,504 possible five-card hands from a pinochle
deck. Although it would be prohibitively time-consuming to measure the
fitness of aLI4,000 individual programs in all5L generations of every run of
genetic programming against this entire universe of 42,504 fitness cases, we
can readily measure the fibress for the one best-of-run program from generation 20 against these 42,504 fihress cases. \A/hen we test the generality of the
best-of-run program by testing it against all possible 42,504 hands, we find
that the two vectors tn 42,504-space agree exactly. The value of out-of-sample
correlation for the best-of-run program over the entire set of 42,504 five-card
pinochle hands is therefore also L.00.
Chapter L6
Table L6.1 Thbleau withoutADFs for the flush problem'
Objective: Find a program that identifies whether a given fivecard hand from a pinochle deck is a flush.
Terminal set
withoutADFs:
CARDO, CARDI, CARD2, CARD3, CARD4, NIL, ANd
FLUSH.
Function set
without ADFs:
SUfT, DENOI4 IF, AND, O& NOT, and nQ.
Fihress cases: L,000 in-sample fihress cases, of which a half are
random flushes and half are random hands that are not
flushes.
Raw fihress: Correlation C (ranging from -L.0 to +1".0).
Standardized fihress: Standardized fibress is
1-C
2
Hits: Not used for this problem.
Wrapper: None.
Parameters: M=4,000.G=51,.
Success predicate: Aprogram scores the maximum number of hits.
Types of points: The rF operator (which is always at the root).
Point in first argument (condition part) of rr.
Point in second (then) or third (else) argument of rr'.
Rules of construction: The root node mustbe an rF.
The condition (first) argument of an rF may contain
any composition of the Boolean operators (n9, aNo,
oR, and Nor), the five cards (cenno, cARDt,
CARD2, CARD3, and caRo4), and the detecting
functions (surt and oeNoM without automatically
defined functions but ADFO and aopt with automatically defined functions).
The second (then) and third (else) argument of an rF
contains only other rFs or references to the terminals
for the two categories (nr,usH or wrl).
Flushes and Four-of-a-Kinds in a Pinochle Deck
422
Since this out-of-sample set of 42,504 fitness cases happens to be an exhaustive set of the entire problem environment, we .* ruy that this best-ofrun Program from generation 10 is a 100% correct solution to the overall
problem of identifying flushes from a pinochle deck. Thus, for this particular
problem, we have both analytically verified and empirically verified that the
genetically evolved program is a perfect solution to the problem.
Note that for more complicated problems, cross validation does not usually involve testing a predicting program against all possible unseen fitness
cases; it merely involves testing against different set(s) of previously unseen
fitress cases. We were able to do an exhuastive cross-validation here only
because the number of pospible five-card hands is only 42,804.
It is interesting to note that we first tried this problem with only 250
randomly chosen in-sample fitness cases. In those preliminary runs, we
learned that a set of fitness cases of such a small size is not sufficiently
rePresentative of the overall problem environment to permit evolution of
a L00"/"-correct predicting programs for this problem (except perhaps by
coincidence). Genetic programming routinely evolved programs that were
capable of perfectly classifying the sets of 250 in-sample fitness cases (i.e.,
these evolved Programs scored only true positives and true negatives on
the 250 in-sample fitness cases). However, these programs were overly
specialized to the particular fitness cases used in the evolutionary process; they did not contain a complete chain of equality comparisons necessary to correctly handle all possible five-card hands; and they did not
generalize. For example, one of these evolved programs verified the equality of the suits of canoo and caRpl and then verified the equality of the
suits of cann2, CARD3, and caRn 4,but did not verify that the suit common to the CARDO and CeRll was the same as the suit common to CARD2,
CARD3, and caRn4. As it happened, none of the 250 in-sample hands was
a non-flush in which the suit common to cARDO and CaRpl- was not the
same suit that was in common with CARD2, CARD3, and caRn 4. Of course,
when tested against previously unseen hands, this program did not correctly classify all the new hands. That is, this predicting program did not
generalize well and failed to score L00%in the cross-validation test. Genetic
programming did not make a mistake in evolving this program. Quite to
the contrary, genetic programming did precisely what it was told to do: it
evolved a highly fit program that successfully grappled with the given
problem environement.
This result is another example of the principle that you get what you pay
for with genetic programming. Genetic programming breeds highly fit programs based on the available fibress measure operating on available fibress
cases. If the fitness cases are not sufficiently representative of the entire
problem environment, the genetically evolved solution will not solve the more
general problem represented by the full problem environment (i.e., will not
generalize). The fullproblem environment resides in the mind of the user of
genetic programming. If a leaming paradigm is to successfully generalize
Chapter L6
to the full problem, the full problem must be communicated by the user to
a learning paradigm in some way. Genetic prograrnming is not clairvoyant;
it relies on the user to provide a sufficiently representative set of fitness
cases to enable genetic pro$amming to solve the problem residing in the
mind of the user.
16.4 PREPARATORY STEPS WITH ADFs
In applying genetic programming with automatically defined functions
to the problem of identifying flushes, we envisage that automatically defined functions will be used to define detectors that will examine the given
five-card hand and that the result-producing branch will perform some
kind of logical analysis on the results produced by the detectors in order
to classify the hand. We decided that each overall Program in the population will consist of a result-producing branch and two three-argument
function-defining branches.
The terminal set, ,lo4S,for eadr of the three-argument defined functions ADF 0
and aoFt is
tadf= {ARGO, ARG1, ARG2 }.
The function set, fadfl,for anrO is
r ^---- n .
fadfl= { surT, DENOM, AND, OR, NOT, EQ}
with an argument map of
{1, 1,2,,2,1,21,
Because ADFI- may not hierarchically refer to ADFO, the function set, fadfl,
for ADF1 is
foafl - {sutt, DENOM, AND, oR, Nor, EQ}
with an argument map of
{ 1, 1, 2,2,1,21.
The body of aop 0 is a composition of the primitive functions from its function set, fa4fl, and the terminals from the terminal set, 'ToAl.SimtIarly, annt is
a composition of elements of fo4y1, andTo6y.
The terminal set, Trpb,for the result-producing branch is
Trpb= {CARDO, CARD1, CARD2, CARD3, CARD4, NrL, FLUSH}.
The function set, frpb, for the result-producing branch is
frpb= {ADFO, ADF1, rF, AND, OR, NOT}
with an argument map of
{3,3,3,2,2,l}.
The result-producing branch is a composition of the functions from the
function set, f*6, and terminals from the terminal sets trp6.The constrained
syntactic structure created by the rules of construction described in table 16.L
423 Flushes and Four-of-a-Kinds in a Pinochle Deck
Thble 15.2 Thbleau withADFs for the flush problem.
Objective: Find a program that identifies whether a given fivecard hand from a pinochle deck is a flush.
Architecfure of the
overall program
with ADFs:
One result-producing branch and two three-argument
function-defining branches, with no hierarchical
references.
Parameters: Branch typing.
Terminal set for the
result-producing
branch:
CARD0, CARD1, CARD2, CARD3, CARD4, NrL, and
FLUSH.
Function set for the
result-producing
branch:
ADFO, ADFI, IF, EQ, AND, OR, and nOf.
Terminal setfor the
two functiondefining branches
ADFO and aDp1.
ARGO, ARG1, and aRC2.
Function set for the
two functiondefining branches
ADFO and anr'l.
EQ, SUTT, DENOM, AND, OR, and Not.
appty with the exception that when automatically defined functions are being
used, the two defined functions (anro and aop'1) are the detecting functions
(rather than DENOM and SUrr).
Thble 16.2 summarizes the key features of the flush problem with automatically defined functions.
1,6.s RESULTS WITH ADFs
hr generation 13 of one run, the following 63-point program achieved 500
true positive and 500 true negatives, with no false positives or false negatives.
The correlation of this program is 1.00.
(progn (defun ADF0 (ARGO ARG1 ARG2 )
(values (EQ ARG2 ARGO)))
(defun ADF1 (ARGO ARG1 ARG2)
(values (NoT (EQ (SUIT ARG2) (OR (SUIT ARGI) ARGI)))))
(values (IF (ADF1 CARDO CARD2 CARD4) (IF (ADF0 CARD2
CARDO CARD4) FLUSH NTL) (IF (ADF1 CARD2 CARD1 CARD2)
(IF CARD3 NIL NIL) (IF (ADFI- CARD]- CARD3 CARD4) (IF
CARD2 NIL NIL) (TT' (ADF1 CARD2 CARD1 CARD2) (TT' CARD3
NrL NrL) (rF (ADFI CARD3 CARDO CARD3) (rp CARD3 NrL
NrL) (rF CARD3 FLUSH NrL) ) ) ) ) ) ) ) .
We can greatly simplify the above program by noting that ADF1 tests for
the inequality of the suits of its second and third arguments (ignoring ARGO).
424 Chapter 16
Moreover, as it happens, the one occurrence of enpO is irrelevant, since it
merely provides an indirect way for NrL to be retumed for a particular situation where NrL should be returned. If we define a new function S= for
fwo-way suit equality (and delete each unreferenced argument), the above
program canbe rewritten and simplified to
(IF (S= CARD2 CARD4)
(IF (S= CARD1 CARD2)
(IF (S= CARD3 CARD4)
(IF (S= CARD1 CARD2)
(IF (S= CARDO CARD3) FLUSH NIL)
NlL )
NIL)
NIL )
NIL) .
It is now clear that this program correct$ performs the desired classification.
When cross-validated against the exhuastive set of 42,504 fitness cases, this
program achieves a correlation of L.00 with no false positives and no false
negatives. Thus, we have both analytically verified and empirically verified
(cross-validated) that the genetically evolved program is a perfect solution to
the problem.
76.6 FLUSHES AND FOUR.OF-A-KINDS
Inthis section, theproblemis changed to require identificationof bothflushes
and an additional type of hand, the four-of-a-kind. The 1,000 fitness cases are
modified for this new three-way classification problem so that a third are
flushes, a third are four-of-a-kinds, ffid a third areneither.If these three {rmbolic categories are represented by the numerical values -L,0, and +1, respectively, raw fibress (correlation) can again be computed as the cosine of the
angle in N;r-space between the zero-mean vector of correct answers and the
zero-mean vector of predictions.
This very time-consuming three-way problem is considerably more difficult to solve than the flushproblem described earlier in this chapter. Each of
severalfllnswithout automatically defined fr.rnctions exhibitedprogressively
better fihress, coming close to a perfect score; however, none of our runs
without automatically defined functions ever produced a solution to this
problem.
Genetic programming did produce solutions to this new problem when
automatically defined functions were used; however, this new problem is so
time-consuming that it is not feasible to make enough runs to obtain enough
solutions for the construction of a meaningful perforrn€u:rce curve. Nonetheless, it is instructive to look at the following especially interesting 294-pornt
program, which appeared in generation 43 with a correlation of L.00 of one
particular run with automatically defined functions. This program correctly
classified every flush in the set of fibness cases as a flush, every four-of-a-kind
as a four-of-a-kind, and every other hand as NrL.
425 Flushes and Four-of-a-Kinds in a Pinochle Deck
(progn (defun ADF0 (ARG0 ARG1 ARG2)
(values (AND (EQ (SUTTARG2) (AND (EO (OR (NOT (SUIT
ARG2) ) (SUIT ARG1) ) (SUIT ARGO) ) (OR (NOT ARGO) (SUIT
ARGI) ) ) ) (OR (NOT ARGO) (SUIT ARGO) ) ) ) )
(defun ADF1 (ARGO ARGI- ARG2 )
(values (EQ (DENOM ARG2) (DENOM ARGI))))
(values (IF (ADFO CARD1 (OR (ADF1 (ADFI (ADFO CARD4 CARD1
CARD1 ) CARD3 (NOT CARD]. ) ) (AND CARD2 CARD1 ) (ANO CARD2
CARD2) ) (AND CARD2 CARD2) ) CARDO) (TT' (ADFO CARD3 CARD1
CARD4) (TT'CARDA FLUSH FLUSH) (]F CARD3 NIL FLUSH) ) (IF
(ADF1 (ADFO CARD4 CARD4 CARD2) (ANN CARD2 CARDI) (ON
CARD3 CARD3)) (IF (ADFI (OR CARD3 CARD3) (AND CARD2
CARD4) (aWO CARD2 CARDI)) (rF (ADF1 (OR CARD3 CARD3)
(AND CARD2 CARDI) (ANO CARD2 CARD2) ) (IF CARD2 FOUR-OFA-KIND FOUR-OF_A_KIND) (IF (ADFI (ADFO CARD4 CARD4
CARD2) (On CARD4 CARD3) (On CARDO CARD3) ) (rF CARD2
FOUR-OF-A-KIND NIL) (IF CARD4 NIL NIL) ) ) (IF (ADF1
(ADFO CARD4 CARD4 CARD2) (EM CARD2 CARDI) (ON CARDO
CARD3)) (IF (ADF1 (ADFO CARD4 CARD4 CARD2) (ANO CARD2
CARD2) (ON CARDO CARD3) ) (IF CARD2 FOUR-OF-A_KIND FOUROF-A-KIND) (IF CARD4 NTL NIL) ) (IF CARD4 NIL N]L) ) ) (IF
(ADF1 (ADFO CARD4 CARD4 CARD2 ) (ON CARD4 CARD3 ) (OR
CARDO CARD3)) (IF (ADF1 (ADFI (ADF1 CARD3 CARD2 CARDI)
(AND CARD3 CARD4) (ON CARD3 CARD3) ) (AND CARD2 CARD2)
(OR CARDO CARD3)) (IF (ADF1 (ADFO CARD4 CARD4 CARD2)
(AND CARD2 CARD2) (ON CARDO CARDO) ) (IF (ADFI (ADFO
CARD2 CARD1 CARDI) (AND CARD2 CARD]-) (ON CARD3 CARD3))
(IF CARD2 FOUR-OF-A_KIND FOUR-OF_A_K]ND) (IF (OR (ADF1
CARD3 CARD2 CARDI) (ON CARDO CARN3)) (IF (ADF1 (ON
CARD3 CARD3) (AWO CARD2 CARD4) (ON CARD3 CARD3) ) (IF
CARD2 FOUR_OF-A_KTND FOUR_OF-A_KIND) (IF (ADF1 (ADFO
CARD4 CARD4 CARD2) (AND CARD2 CARDI) (ON CARDO CARD2) )
(IF (ADF1 (ADFO CARD4 CARD4 CARD2) (ON CARD4 CARD3) (OR
CARDO CARD3) ) (TF CARD2 FOUR-OF-A_KrND }drl,) (rF CARD4
NTL NIL) ) (]F CARD4 NIL NIL) ) ) (IF CARD4 NIL N]L) ) ) (IF
CARD4 NIL NIL) ) (TT'CARD4 NIL NTL) ) (IF CARD4 NTL
NrL) ))))).
When analyzed, this program proves to generalize successfully over the
entire problem environment. In the above program, ADFO tests for the equality of the suits of its three arguments and ADF1 tests for the equality of the
denomination of its second and third arguments (ignoring its ARGO). If, for
the purposes of explanatiory we define a new function 53 = for three-way
suit-equality and a new function D2= dstwo-way denomination-equality (and
delete each unreferenced argument), the above program canbe simplified to
(IF (S3= CARD1 CARD2 CARD0)
(TT 1S3= CARD3 CARD1 CARD4) FLUSH NrL)
(IF (D2= CARD1 CARD3)
(IF (D2= CARD4 CARDI)
426 Chapter 16
(IF (D2= CARD1 CARD2)
FOUR-OF-A-KIND
(IF (D2= CARD4 CARDO) FOUR-OF-A-KIND NIL) )
(IF (D2= CARDI CARD0)
(IF (D2= CARD2 CARDO) FOUR-OF-A-KIND NIL)
NIL) )
(rF (D2= CARD4 CARDO)
(IF (D2= CARD2 CARD0)
(IF (D2= CARD4 CARD3)
FOUR-OF-A_KIND
(IF (D2= CARD1 CARD0) FOUR-OF-A-KIND NIL) )
NIL)
NIL) )).
We can now see that this program correctly performs the desired three-way
classification of a five-card hand from a pinochle deck.
This program achieves a correlation of 1.00 with no false positives and no
false negatives when cross-validated against the exhuastive set of 42,504
fitness cases. Again, we have both analytically verified and empirically
verified (cross-validated) that the genetically evolved program is a perfect
solution to the problem at hand.
Flushes and Four-of-a-Kinds in a Pinochle Deck
!7 Introduction to Biochemistry and
Molecular Biology
This chapter provides an introduction to certain computational aspects of
biochemistry and molecularbiologythat are relevant to theproblems considered in subsequent chapters.
I7.L CHROMOSOMES AND DNA
The structure of all living things on earth is specified by the information contained in nucleic acids, largely as chromosomes composed of deoxyribonucleic
acid (DNA).
The informational content of the DNAmolecule canbe viewed as a character string over a four-character alphabet representing the four nucleotide
bases,namely adenine (fl), cytosine (C), guanine (G), and thymine (T).
Different fonts (tabulated in appendix C) are used in this and subsequent chapters to distinguish among the multiple single-letter codes used
by biochemists and molecular biologists. For example, the single letter C
denotes the nucleotide base cytosine mentioned above; C denotes the carbon atom; C denotes the amino acid residue cysteine (explained below);
and C denotes the carboxy terminal (end) of a protein (explained below).
The DNA molecule consists of a long sequence of these four nucleotide
bases. T}ire genome of a biological individual is the sequence of nucleotide
bases along the DNA of all of its chromosomes. The human genome contains about 2,870,000,000 nucleotide bases. The genome of a simple bacterium, such as Escherischia coli, contains about 4 million nucleotide bases.
Normally two molecules of DNA are interwound to form a double helix.
Each DNA molecule redundantly stores information in complementary
pairs so each nucleotide base is always paired with a particular other base
in the complementary strand of DNA. Cytosine (C) is always paired with
guanine (G) and vice versa; adenine (R) is paired with thymine (T) and
vice versa. Thus, each strand of DNA contains the same information as
the other strand.
Many higher organisms are diploid in the sense that they carry DNA
(not necessarily exactly identical) from both parents.
17.2 ROIE OF PROTEINS
Proteins are resPonsible for such a wide variety of biological structures and
functions that it can be said that the structure and function of living organisms are primarily determined by proteins (Stryer 1988). For example, some
proteins are used to generate nerve impulses (e.g., rhodopsin is the photoreceptor protein in retinal rod cells). Proteins enable signals to be communicated through the nervous system. Other proteins kansport particles such as
electrons, atoms, or large macromolecules within living organisms (e.g.,
hemoglobin transports oxygen in blood). Some proteins store particular particles for later use (e.g., myoglobin stores oxygen in muscle). Some proteins
provide physical structure (e.g., collagen gtves skin and bone their high tensile strength). Other proteins create physical contractile motion (e.g., actin
and myosin). Proteins are the basis of the immune system (e.g., antibodies
recognize and combine in highly specific ways with foreign entities such as
bacteria). Hormonal proteins transmit chemical instructions. Other proteins
control the expression of the genetic information contained in the nucleic acids.
Growth-factor proteins control growth and differentiation.
Perhaps the most important role of proteins is that they catalyze chemical
reactions in biological systems. Nearly a1l chemical reactions in biological
systems are catalyzed by a specific macromolecule (i.e., an enzyme) and nearly
all known enzymes are proteins. The catalytic power of enzymes is enormous, often changing the rate of a reaction by so many orders of magnitude
that they can effectively be viewed as determining whether or not the reaction occurs.
17,3 TRANSCRIPTIONAND TRANSLATION
The informational content of DNA controls the manufacture of proteins by
the processes of transcription and translation. A typical protein is manufacfured using the information contained in about one thousand nucleotide bases
(akilobase). Proteins are manufactured by the ribosomes. Aribosome can be
viewed as a smallfactorywhose inputconsists of the availablemolecularraw
materials and the informational content of the DNA molecule and whose
output consists of a protein.
The transuiptionprocess maps the information contained in the DNAmolecule onto a messenger RNA (mRNA) molecule. This mapping is a one-toone mapping of one nucleotide base of DNA onto one base of mRNA. Cytosine
(C) on DNA is mapped to guanine (G) of mRNA; guanine (6) on DNA is
mapped to cytosine (C) of mRNA; thymine (T) on DNA is mapped to
adenine (fl) of mRNA. Adenine (H) of DNA is mapped to uracit (U), a substance that is closely related to thymine (T).
The ribosomes translate the sequential information of the messenger RNA
into a protein using the genetic code. The proteins of all living things on earth
are composed of linear strings of the same 20 amino acids. The translation
Chapter L7
process maps a consecutive sequence of three bases of mRNA (called a codon)
into one of the 20 amino acids. Translation is performed in accordance with
the genetic code that maps each of the 64 possible combinations of the bases
found on mRNA into one of the 20 amino acids. In the genetic code, between
one and six of fhe 64 possible combinations cern be mapped to one amino
acid. For example, a ribosome will translate a codon of messenger RNA consisting of the nucleiotide bases G, G, and C into the amino acid glycine (G).
The amino acid residue called forby the codon of messenger RNA is supplied to the ribosome by a molecule of transfer RNA (IRNA) from the milieu
of the cell. Each molecule of IRNA carries an anti-codonwhich can bind, by
meeins of complementary base pairing, to a particular one of the 64 codons of
mRNA.
The genetic code is common to virtually all living things on earth.
Not all of the nucleotide bases of DNA are actually transcribed and translated into amino acids. The nucleotide bases that are ultimately expressed
into amino acids by the processes of transcription and translation are called
exons. The unexpressed nucleotide bases of DNA are called introns. The
introns are edited out.
A typical protein contains around 330 amino acids (i.e., about a thousand
actuaIly transcribed and translated nucleotide bases). For humans, only about
3% of the2,870,000,000 bases of the DNA are actual$ transcribed and translated into amino acids by means of the genetic code. If about 108 bases of
human DNAare actually expressed and if the human proteins average about
L03 bases, then there would be around 105 human proteins. In simpler organisms, virtually t00% of the DNAis expressed. If all the 4,000,000 bases of the
DNA of the Escherichin colibacterium are expressed and rt E. coli proteins
average 1,000 bases, then there would be about 4,000 genes tnE. coli. Agene ts
the area of DNA that becomes expressed as a protein by the processes of
transcription and translation.
Table LT.L shows the standard single-letter codes for the 20 amino acids
occurring in proteins in alphabetic ordeq, along with the fulI name of the amino
acid, the standard three-letter code for the amino acid, and the combinations
of bases of mRNA that are translated into that particular amino acid. An
asterisk is used to indicate that the third base does not matter. For example,
row L of this table shows GC* for the amino acid alanine, indicating that
alanine arises from GCR, GCC, GCG, and GCU. There are 43 = 64 three-character combinations of the four bases. A particular amino acid may arise from
between one and six combinations of bases. UflU, URC, and UGR are stop
codons that terminate the translation process and do not appear in this table.
flUG codes for the start codon and for methionine. If it were not for the fact
that the initialmethionine is frequently edited out of the finalproteinbyposttranslational editing processes, the first amino acid of all proteins would be
methionine. This table is the genetic code.With certain exceptions that may
reflect evolutionary development, the genetic code is virtually universal for
all life forms on earth.
Introduction to Biochemistry and Molecular Biology
Table L7.L The 20 amino acids and the genetic code.
One-letter code Amino acid Three-letter code Genetic code
A
C
D
E
F
G
H
I
K
L
M
N
P
O
R
S
T
W
Alanine
Cysteine
Aspartic Acid
Glutamic Acid
Phenylalanine
Glycine
Histidine
Isoleucine
Lysine
Leucine
Methionine
Asparagine
Proline
Glutamine
Arginine
Serine
Threonine
Valine
Tryptophan
Tyrosine
Alo
cys
Asp
Glu
Phe
Glv
His
lle
Lys
Leu
Met
Asn
Pro
Gln
Arg
Ser
Thr
Vol
Trp
Tyr
GC*
ucu, uGc
GRU, GRC
GRR, GRG
uuu, uuc
GG*
cRU, CRC
RUU, RUC, RUR
RRR, RR6
uuR, uuc,cu*
RUG
RRU, RRC
cc*
cRR, CRG
cG*, flGfl, RGG
uc*, RGU, RGC
RC*
GU*
UGG
URU, URC
I7.4 AMINO ACIDS AND PROTEIN STRUCTURE
Proteins are a relativelyhomogeneous class of molecules in spite of theirmany
different and wide-ranging biological functions. All proteins are composed
of amino acids arranged in a linear chain. The molecular structure of the beginning and the end of every protein charn, as well as the molecular structure
of the bonding of the adjacent amino acids along every chain, is identical for
all proteins. Proteins differ in the particular sequence of the 20 amino acids
that appear along the chain and in the three-dimensional structure that
arises as a consequence of the particular sequence of amino acids.
A protein consists of a sequence of amino acids (also called residues). The
backbone (main chain) of a protein starts at an N terminal (amino terminal)
consisting of HiN- and ends at its C termirnl (carboxy terminal) consisting of
-COO-. Between these two ends, a protein consists of repetitions of a
nonvariable group of six atoms and a variable group of atoms, called the side
chnin. Three of the six nonvariable atoms appear along the backbone of the
protein. The backbone contains repetitions of one nitrogen atom N, one central carbon atom (called the a-carbon or Cocarbon), and a second carbon atom
Chapter 17
Position 3
Figure L7.1 Hypothetical protein of length three with unspecified side chains, Rr, Rr, and Rr.
(called the C' carbon). The other three of the six nonvariable atoms are
attached to the backbone. The attached atoms include one hydogen atom
H covalently bonded to the nitrogen of the backbone, another H covalently
bonded to the Co carbon atom of the backbone, and one oxygen atom O
bonded to the C' carbon of the backbone. There are as many repetitions of
these six atoms as there are amino acids in the protein. The bond linking
the C' carbon of each group to the nitrogen N of the next group along the
chain is called a peptide bond. Consequently, proteins are called polypEtides.
The side chnin attached to each cr-carbon Co along the backbone is the variable part of the protein. The particular side chain that is attached to each
particular o-carbon is specified by the three bases of the codon of mRNA
(and therefore originallyby the tfuee nucleotide bases of the DNAthat were
transcribed into the three bases of the codon of mRNA).
Figure 17.L shows a hypothetical protein sequence in which three unspecified side chains Rr, Rz, and R3, each connected to an cr-carbon atom Co. Position L of this figure contains the N terminal (the H{N-). Side chain R' is
connected to its cx-carbon. Position 2 is the generic intermediate position of
the protein; its side charry R2, is connected to its cr-carbon. Position 3 contains
the C terminal (the -COO). Side chain R3 is connected to its cr-carbon. The
backbone of the protein mns horizontally through the middle of the figure
connecting the N, Co, C', N, Co, C', N, and Cd. If the length of the protein
sequence were greater than 3 (the average is about 330), there would be
additional copies of the entire structure shown in position2 of this figure
- that is, the N, C*, and C' of the backbone, the H, H, and 0 connected to
these three backbone atoms, and one of the 20 possible side chains.
Each of the 20 possible side chains is a particular chemical composition of
atoms. The side chain for the amino acid glycine (denoted Gly or G) consists
of only one atom (a hydrogen H). The side chains of the other L9 amino acids
each consist of several atoms. For example, the side chain of the amino acid
cysteine (CyS or C) consists of one sulfur S, one carbon C, and two hydrogen
atoms (-CH5S). The side chain for serine (Ser or S) consists of one carbon C,
one oxygen 0, and three hydrogen atoms (-CH5OH).
Figure L7.2 shows ahypotheticalprotein sequence of length3; itis the same
as figure L7.7,excepthat the side chains, Rr, Rz, and R3, are now specified as
glycine, cysteine, and serine.
Introduction to Biochemistry and Molecular Biology
fg*t'-* ---Qry-"irg----,
HO:
tlti:lll: lll,i:lll,: H{ N- !o- b'-j--,i- ry- ia- b-l
H
I
I\-CgCOOtl
H G{2
I
OH
H I iH &rz
I
SH
Figure 17,2 Hypothetical protein consisting of glycine, cysteine, and serine.
RPDFCLEPPY TGPCKARIIR YFYNAKAGLC QTFVYGGCRA KRNNFKSAED
CMRTCGGA
Figure 17.3 Primary structure of bovine pancreatic trypsin inhibitor (BPTI).
I7.5 PRIMARY STRUCTURE OF PROTEINS
The sequence of amino acid residues along a protein chain is called the primary structure of a protein. For example, the primary structure of the hypothetical protein of figure 17.2ts Gly-Cys-Ser using the standard three-letter
codes for amino acid residues and GCS using the standard one-letter codes.
Once the primary structure is specified, all atoms of the protein are specified.
As a further illustratiory bovine pancreatic trypsin inhibitor (BPTD is an
atypically small protein containing only 58 amino acid residues (about a fttth
of the number in an average protein). Figure 17.3 shows the one-letter codes
for the 58 amino acid residues in the primary structure of BPTI.
17.6 SECONDARY STRUCTURE OF PROTEINS
Amino acid residues along the protein chain often arrange themselves locally
into certain features, o-helices and B-strands being the most cofiunon. These
features constitute the secondary structure of theprotein. The a-helixis aregular spiral-like structure in which the main chain spirals much like the red
spiral stripe on a barber's pole. The B-strand is a flatter regular structure in
which the main chain zigzagslike a sheet of comrgated metal. Two p-strands
may form a B-sheet in which alternate amino acid residues along each zigzaggSng B-strand are joined by hydrogen bonds.
Thble 17.2 shows the secondary structure of bovine pancreatic trypsin
inhibitor (BPTI). BPTI contains two cr-helices and two B-strands.
The fust a-helix (called H1), shown in table I7.2, encompasses six residues
of BPTI beginning with the proline located at the second position along the
backbone of the protein starting at the N-terminal end (referred to as "proline
2" or "Pro2" usingthe three-letter code for amino acids) and ending with the
glutamic acid located at the seventh position (referred to as "Glu 7"). The
5 0
5 B
434 Chapter 77
H1
t12
S1
32
SS1
S52
SS3
Table 17.2 Features of the secondary structure and the disulfide bonds of bovine
pancreatic trypsin inhibitor (BPTI).
Feature Type of feature
ct-helix
u-helix
B-strand
B-strand
Disul-fide bond
Disulfide bond
Disulfide bond
Pro2 GluT
Ser47 Gly 56
Leu 29 Tyr35
lle 18 Asn 24
Cys 5 Cys 55
Cys 14 Cys 38
Cys 30 Cys 5L
second cr-helix, H2, starts at serine 47 (Ser 47) andends at glycine 56 (Gly 56).
The average cr-helix is about a dozen residues long, so helix H2 is of average
size.
The first B-strand, B1, of BPTI in table t7.2 stafts at leucine (Leu) 29 and
ends at tyrosine (Tyr) 35. The second B-strand, 82, starts at isoleucine (lle) t8
and ends at asparagine (Asn) 24.
About a quarter of all amino acid residues of a typical protein are organized into cr-helices and approximately another quarter of all residues are
organized into B-strands.
The omega loop (chapter L9) is an irregular loop structure on the surface of
a protein; the omega loop is shaped somewhat like the Greek letter Q. fl-loops
account for approximately another quarter of all residues and are considered
by some to be another secondary structure of proteins. The 3n-helix (which is
tighter than the u-helix) occurs much less frequently than the u-helices and F
strands. There are no omega loops in BPTI.
[r addition, the disulfidebondis an important stability-conferring structure
that coval" tly links distant pairs of rysteine residues in some proteins. BPTI
has three disulfide bonds: one linking Cysteine 5 with Cysteine 55; one linking Cysteine 14 and Cysteine 38; and one linking Cysteine 30 and Cysteine
51. We include the disulfide bonds along with the cx-helices and B-strands as
part of table L7.2 (even though these covalent disulfide bonds are properly
considered part of the protein's primary structure).
Figures 17.4 throught7 .7 show the general structure (after Hamagudri 1993)
of the backbone of BPTI. Circles denote the location of the cr-carbons along
the backbone. These two-dimensional figures give a general idea of the
arrangement of the features of BPTI; howeve4 they do not purport to be an
accurate projection from any perspective of the three-dimensional structure
of BPTI. A more accurate three-dimensional view of this protein appears in
Genetic Programming II Videotape: The Next Gmeration (Koza and Rice L994).
hr figure 77.4 each of the 58 residues of BPTI is represented by a circle containing the residue number and the one-letter code for the residue. Residue 1
435 Introduction to Biochemistry and Molecular Biology
Figure 17.4 General structure of bovine pancreatic trypsin inhibitor.
at the N-terminal end of the protein is arginine (R); residue 58 at the C-terminal end is alanine (A). Both residues L and 58 are found near the bottom of
this figure.
Figure 17.5 shows the two cr-helices of BPTI, one located between residues
2 arrrdT and one located between residues 47 and 56. In this figure, the 58
residues along the backbone are reduced to open circles to highlight the
cr-helices.
Figure 17.6 highlights the two p-strands of BPTI, one running between residues 18 and 24 and the other running between residues 29 and35. Contrary
to the impression that might be created by this two-dimensional figure, the
two p-strands do not intersect in three-dimensional space.
Figure 17.7 shows the three disulfide bonds; one linking cysteines 5 and 55;
one linking cysteines 14 and 38; and one linking cysteines 30 and 51.
L7.7 TERTIARY STRUCTURE OF PROTEINS
As a protein is being manufacfured by a ribosome, the entire protein chain
spontaneously folds into a unique three-dimensional spatial arrangement,
called its natiae structure, conformation, or the tertiary structure of the protein.
The tertiary strucfure of a protein consists of the three-dimensional spatial
affangement of all the atoms of the protein. The behavior and function of a
436 Chapter 17
Figure 17.5 The two a-helices of bovine pancreatic trypsin inhibitor.
Figure L7.6 The two B-strands of bovine pancreatic trypsin inhibitor.
Introduction to Biochemistry and Molecular Biology
Figure 17.7 The three disulfide bonds of bovine pancreatic trypsin inhibitor.
Protein in an organism is primarily dependent on the precise three-dimensional location of its individual atoms. For example, certain key areas, called
actiae sites, may interact with particular other molecules in highly specific
ways. Knowledge of the three-dimensional structure of a protein is usually
required to fuIly understand how a protein performs its biological function.
The three-dimensional structure into which a protein spontaneously folds
is a unique characteristic of the particular protein (under specified prevailing
physiological conditions). This unique three-dimensional structure is thought
by many to be a global energy minimurn or near-minimum for the protein (or
at least the minimum or near-minimum that is accessible to all unfolded states
of the chain starting from the time of its manufacture by the ribosome).
It is broadly true that the unique three-dimensional spatial structure of a
protein is determined by the primary sequence of the protein (Anfinsen 7g7g).
hr certain special cases, other molecules (e.g., chaperon molecules) may aid in
the folding of some particular proteins; certain other molecules may sometimes be necessary to maintain the folded conformation of some proteins;
and various post-kanslational modifications may occur after the protein's
manufacture by the ribosome for some proteins.
Determining the relationship between the primary structure of a protein
(i.e., the linear sequential arrangement of amino acids) and its three-dimensional structure (i.e., the three-dimensional coordinates of every atom of the
protein) is the premiere problem of contemporary molecular biology and is
438 Chapter L7
called the protein folding problem (Schr:Jzand Schirme rL979;Gierasch and King
L990; Branden and Tooze1991,; Lesk L99l;Creighton lggg).
The nature of the protein folding problem canbe appreciated by considering the three-dimensional coordinates of the atomic structure of one small
part of one particular protein. The Protein Data Bank (PDB) maintained by
the Brookhaven National Laboratory in Upton, New York (Bemstein et al.
I97n is the worldwide computerized repository of the three-dimensional
coordinates of the atomic strucfure of proteins.
Table 17.3 shows two small portions of the tefttary structure of bovine pancreatic trypsin inhibitor (BPTI) from the PDB. Specifically, the table shows the
x,y, andz three-dimensional coordinates for 20 of the 9l-8 atoms listed in the
PDB for BPTI (under the code SPTD. The 20 atoms shown come from two
groups/ 10 being associated with Cysteine 5 and L0 with Cysteine 55. The first
and second columns of the table identify the amino acid residues; the third
and fourth columns identify the individual atoms belonging to the protein
backbone and the rysteine residues. The fifth, sixth, and seventh columns
show the A A, andz coordinates of the atom.
The protein backbone consists of the six atoms common to each residue
of every protein sequence. Atoms 74-77 are the nitrogen, the c-carbon,
the C'-carbon, and the oxygen, respectively, belonging to the protein backbone for Cysteine 5; atoms 883-886 are the nitrogen, the a-carbon, the
C'-carbon, and the oxygen, respectively, belonging to the protein backbone for Cysteine 55.
Atom 81 is the hydrogen of the backbone that is bonded to the u-carbon for
Cysteine 5; atom 890 is the hydrogen of the backbone bonded to the o-carbon
of Cysteine 55.
Atom 80 is the hydrogen (called t}'ie amide ltydrogen, denoted by D-H) of the
backbone bonded to the nitrogen of the main chain for Cysteine 5; atom 889 is
the amide hydrogen of the backbone bonded to the nitrogen of the main chain
for Cysteine 55.
Each cysteine side chain consists of one sulfuq, one carbon, and three
hydrogen atoms (i.e., -CH2-S). Atom 78 is the carbon belonging to Cysteine 5
(called the ftcarbon of the residue); atom 887 is the corresponding B-carbon
of Cysteine 55. Atom 79 is the sulfur of Cysteine 5; atom 888 is the corresponding sulfur of Cysteine 55. Atoms 82+3 are the two hydrogen atoms
belonging to Cysteine 5; atoms 89l492are the correspondi.g two hydrogen
atoms belonging to Cysteine 55.
As an illustration of the way that proteins fold in three-space as shown by
the PDB, consider atoms 79 artd888 (the sulfur atoms, S, belonging to the two
cysteines, Cysteine 5 and Cysteine 55). The three-dimensional coordinates of
sulfur atom 79 are
[3L.075, 12.797, -7 .325]
and the coordinates of sulfur atom 888 are
129 .664, 13.161, -5.8931.
The Euclidean distance between these two sulfur atoms is2.043As. Using the
fact that one hydrogen atom is about one A in size, these two sulfur atoms
439 Introduction to Biochemistry and Molecular Biology
440
Table117.3 TWo small portions of the tertiary structure of bovine pancreatic trypsin
inhibitor (BPTI) from the Protein Data Bank.
Amino
acid
residue
Residue
number
Atom
number
Atom v
7
cys
cys
cys
cys
cys
cys
cys
cys
cys
cys
5
5
5
5
J
5
5
5
5
5
74
75
76
77
78
79
80
81
82
83
N
cr-C
c
o
B-c
yS
D-H
rx"H
B1-H
92-H
32.757
31..286
30.864
29.690
30.794
31.075
33.206
30.9&
31.501
29.793
10.236
10.029
8.652
8.279
11.065
12.797
10.888
10.266
10.859
10.892
-6.732
-6.794
-7.254
-7.116
-7.789
-7.325
-7.363
-5.800
-8.603
-8.171
cys
cys
cys
cys
cys
cys
cys
cys
cys
cys
55
JJ
55
55
55
55
55
55
55
55
883
884
885
886
887
888
889
890
891
892
N
cr-C
c
o
p-c
yS
D-H
s-H
B1-H
Fz-H
28.364
28.337
27.258
27.484
28.265
29.6&
27.61.4
29.253
27.388
28.059
15.919
14.779
14.663
13.831
13.520
13.16I
15.974
14.775
13.519
12.720
-6.980
-7.839
-8.899
-9.733
-5.893
-5.893
-6.323
-8.417
-6.349
-7.695
are, in the context of proteins, very close to one another in three-space. In
other words, although these two cysteine residues occupy very distant positions in the primary sequence (positions 5 and 55), they are physically very
close in three-space after the protein spontaneously folds into its tertiary structure. In fact, after the folding, the two sulfur atoms of the cysteine residues
participate in a disulfide bond. Disulfide bonds confer considerable additional stability to the three dimensional structure into which the protein has
folded itself. Similarly, the sulfur atoms of Cysteine L4 and Cysteine 38 and
the sulfur atoms of Cysteine 30 and Cysteine 5L are very close in the folded
protein and, in fact, form a disulfide bond.
The protein folding problem canbe restated in terms of figure 17.3 showing the amino acid residues of the primary structure and in terms of table 17.3
showing the tertiary structure as follows: Given the primary sequence of amino
acid residues of a protein in the format of figure 17 .3,predict the three-dimensional x,y, artdz coordinates for t00% of the atoms of the protein in the format
of table17.3.
hr the past, the primary sequences of proteins were typically determined
through time-consuming chemical analysis of the proteins themselves. However, because the primary sequence of proteins is specified by the underlying
Chapter 17
chromosomal DNA sequence and because of the extensive current worldwide
effort to map the entire DNA sequences of various organisms (e.g., E. coli
bacteria, yeast, frvtt fly, white mouse, and humans in the Human Genome
Project), primary sequences are becoming available at a rapid rate. As of the
end of 1993, the primary structures of approximately 33,329 proteins containtngLl,484,420 amino acid residues from various organisms have been determined and deposited in various computerized databases (such as the
SWISS-PROT database).
In contrast, determining the tertiary structure of proteins requires x-ray
crystallography or nuclear magnetic resonance (NMR) techniques. These
determinations are exceedingly time-consuming, the crystallographic
method currently requiring about three years of work for each protein
studied. Consequently, the number of proteins whose tertiary structures
is known is a tiny fraction of the number of proteins whose primary structures is known. For example the April 1993 quarterly release of the Protein Data Bank contains 1,11.0 fully annotated atomic coordinate entries
(of which 56 were new for that quarter). Many of the studies in the PDB
are of the same protein with different levels of crystallographic resolution, of the same protein under different conditions, of mutants of the same
protein (natural or engineered), and of functionally similar proteins from
different species (often those thought to be evolutionarily related). Depending on the criteria for similarity, there are only about 150 to 200 " different" tertiary structures available in the PDB. This number is, of course,
only a tiny fraction of the estimated 100,000 different proteins in humans
and the estimated 4,000 different proteins rnE. coli.This already considerable gap is widening.
Thus, there is a major need for automated methods of predicting tertiary
structure fromprimarystructure. Since therules forproteinfoldingare largely
unknowrt artificial intelligence, machine learning, and automatic programming may provide a way to satisfy part of this need. hr some cases/ approximate or likely solutions to problems involving the secondary and tertiary
strucfure of proteins mayhave practical uses in increasingthe understandi^g
of proteins.
Of course, the fact that there are only 150 to 200 different tertiary structures
in the PDB is a severe limitation on the operation of automated methods for
prediction that rely on recognizing and generuluing pattems and relationships. All such methods, whether they be based on statistics, neural networks,
genetic algorithms, decision trees, clustering, or other methods of machine
learning or automatic prografirming, must be guided by a reasonably large
number of examples of the relationship between the variables of interest. The
study of protein folding is further complicated by the fact that the 150 to 200
differentproteins in the PDB are atypical of proteins in general in a number of
important respects. For example, the proteins contained in the PDB are those
for which it is practical, economical, politically acceptable, or possible to isolate the protein in stable form and to grow crystals. In addition, because of the
441 Introduction to Biochemistry and Molecular Biology
many practical limitations of crystallographic techniques, the PDB tends to
contain atypically short proteins. The average length of proteins in the PDB is
only about L75 residues, compared to an overall average length of roughly
330 forallproteins. There are, of course, manymore degrees of freedominthe
folding process for proteins of average and above-average size than there are
for atypically small proteins.
L7.8 QUARTERNARY STRUCTURE OF PROTEINS
Some proteins contain more than one cltnin (subunit). The three-dimensional
spatial arrangement of subunits is the quatunary structure of a protein.
For example, hemoglobin (an oxygen-transporting protein) contains four
subunits called &,uo, F" and F,.The two alpha subunits are identical and the
two beta subunits are identical. Moreove4 the alpha subunits are very similar
to the beta subunits. Each of these four subunits of hemoglobin is, in turn,
very similar to the myoglobin molecule. An iron-containing heme group is
associated with the myoglobin molecule. Oxygen binds to the iron molecule
thereby permitting oxygen to be stored by myoglobin. Oxygen similarly binds
to the four iron molecules in hemoglobin thereby permitting oxygen to be
transported in the bloodstream.
A single human red blood cell contains about275,000,000 molecules of
hemoglobin. These molecules are identical except for the physical location in three-space of their constituent atoms.
17.9 GENETIC ALGORITHMS AND MOLECULAR BIOLOGY
Genetic algorithms are being increasingly applied to problems of molecular
biology. Lucasius and Kateman 1989 applied genetic algorithms to
chemometrics. Konagaya and Kondou 1993 exhacted stochastic motifs from
sequences using a genetic algorithm and the minimum description length
(MDL) principle. Platt and Dix 1993 constructed restriction maps using a
genetic algorithm. Cedeno and Vemurilgg3 investigated DNAmapping with
genetic algorithms. Fickett and Cinkosky 1993 applied the genetic algorithm
to assembling chromosome physical maps. Sun 1993 used genetic algorithms
with a reduced representation model of protein strucfure prediction. Unger
and Moult (I993a,1993b, L993c) applied genetic algorithms to evolving selfavoiding curves resembling the way proteins fold. See also Cantor and Lim
1991, and Lim, Fickett, Canto{, and Robbins L993.
Ishikawa et aI. (L993) and Tajima (L993) applied parallel genetic algorithms to sequence alignment. Schulze-Kremer (1993) applied genetic
algorithms to tertiary structure prediction for the protein crambin. See
also Thkagi (1993).
fones (1993) used genetic algorithm for searching databases of chemical
strcutures. Lucasius et al. (199t) used genetic algorithms for a conformational analysis of DNA. Marcel et al. (1992) used genetic algorithms for a
442 Chapter 17
conformational analysis of a dinucleotide photodimer. Hibbert (1993) studied
the display of chemical structures in two dimensions using genetic algorithms.
Le Grand (1993) tested the genetic algorithm for performing conformational search on polypeptides and proteins on 46-residue protein crambin
with the AMBER potential energy function. A knowledge-based potential
energy function was developed and used to predict the structures of melittin,
pancreatic polypeptide, and crambin.
Introduction to Biochemistry and Molecular Biology
1_8Prediction of Tlansmembrane Domains
in Proteins
Mury problems involving the computational analysis of proteins are similar
to the pattem-recognition problems in chapter L5 and 16 in that a major part
of the problem is the dynamic evolution of initially-unknown reusable feature detectors.
In additiory a number of problems from computational biology are similar
to the flush problem of chapter 16 in that correlation is a reasonable measure
for fitness when genetic programming is applied to the problem.
Howeve{, there are four important practical differences between the flush
problem and problems from the real world. First, the entire problem environment (i.e., the universe of fitness cases) was known for the flush problem, but
it is not usually known for problems from the real world. Second, the entire
problem environment was sufficiently small for the flush problem (i.e.,42,504
five-card hands) that it was possible to cross-validate a genetically evolved
predicting programby exhaustively testing it againstthe entire environment;
this is also generally not the case for problems from the real world. Third, a
L00%-correct solution to the flush problem is attainable so there is no question as to how to define the success predicate of the problem; we usually do
not have the luxury of sufficient foreknowledge to know how to do this for a
problem from the real world. Fourth, the problem environment is fully understood so it is possible to verify analytically that a program is a 100%-correct solution to the problem; we rarely have the luxury of certainty for a
practical problem.
This chapter and chapters 19 and 20 consider several problems of pattem
recognition and classifi cation from computational biology.
The problem of deciding whether a given protein segment is a transmembrane domain provides an opportunity to illustrate the automatic discovery
of reusable feafure detectors, to again employ correlation as the fitness measure, to incorporate iteration into genetically evolved computer programs,
and to illustrate the use of state (memory) in genetically evolved programs.
In this chapteq, genetic programming is used to create a computer program
for predictingwhether ornot a glven subsequence of amino acids in a protein
is a transmembrane domain of the protein. Genetic programming will be given
a set of differently-sized protein segments and the correct classification for
each segment. The predicting program will consist of initially-unspecified
detectors, an initially-unspecified iterative calculation incorporating the asyet-undiscovered detectors, and an initially-unspecified final result-producing calculation incorporating the results of the as-yet-undiscovered iteration.
Although we will grve a biological intelpretation of the results, the automated Process does not know the chemical characteristics or biological meaning of the sequence of amino acids appearing in the protein segment. Similarly,
the reader may ignore the biological interpretation and view this problem as
a one-dimensional pattem recognition problem. The techniques used in this
drapterto do calculations onaproteinsequencecanbe applied to any sequence
or time series (e.g., economic data).
1.8.1. BACKGROUND ON TRANSMEMBRANE DOMAINS IN
PROTEINS
Membranes play many important roles in living things. Atransmembrane protein (Yeagle 1993) is embedded in a membrane in such away that part of the
protein is located on one side of the membrelne, part is within the membr€ule,
and part is on the opposite side of the membrane. The membrane involved
may be a cellular membrane or some other type of membrane. Transmembrane proteins often cross back and forth through the membrane several times
and have short loops immersed in the different milieu on each side of the
membrane. Understanditg the behavior of transmembrane proteins requires
identification of the portion(s) of the protein that are actually embedded within
the membrane, such portion(s) being called the transmembrane domain(s) of
the protein. The lengths of the transmembrane domains of a protein are usually different from one another and the lengths of the non-kansmembrane
areas are also usually different from one another.
Transmembrane proteins often perform functions such as sensing the presence of certain particles or certain stimuli on one side of the membrane and
transporting particles or transmitting signals to the other side of the membrane. For example, the transmembrane protein rhodopsin is the photosensitive pigment of retinal rod cells. Cystic fibrosis transmembrane conductance
regulator is a transmembraneproteinimplicated in the genetic disease of cystic fibrosis that controls the flow of chloride in and out of l*g cells. When a
functionally correct copy of the gene producing this protein is not inherited
from at least one parent, thick mucous builds up, causing lung damage and
eventual death (often by age 20 or 30).
The goal in this section is to use genetic programming to evolve a computer program for predicting whether or not a particular protein segment
(i.e., a subsequence of amino acid residues extracted fromtheentire sequence)
is a transmembrane domain. Biological membranes are of oily hydrophobic
(water-hafug) composition. The amino acids in the transmembrane domain
of a protein that are exposed to the membrane therefore have a pronounced,
but not overwhelming, tendency to be hydrophobic. M*y transmembrane
domains are cr-helices.
446 Chapter 18
It should be noted that some transmembrane domains are B-sheets (Sdrirmer
and Cowan 1993). Protein segments of this type can be identified as being
transmembrane domains by the predominantly hydrophobic nature of the
particular residues that the p-sheet actually exposes to the membrane.
Because they are extremely difficult to analyze in the laboratory, very few
transmembrane proteins of this latter type currently appear in the existing
computerized databases. This bias in the computerized databases has the
practical effect of excluding transmembrane proteins of the ftsheets type from
our experiments here.
Figure L8.1 shows a topological model of the transmembrane protein
bacteriorhodopsin from the Halobacterium salinariumbacterium (Teufel et al.
1993). Bacteriorhodopsin is a photosensitive protein that enables bacteria to
respond to light. It acts as a light-driven proton pump.The left of the figure
corresPonds to the outside of the cell and the right corresponds to the
intracellular region. Bacteriorhodopsin is a248-residue protein that is folded
into a bundle of seven cr-helices. The N-terminal end of the protein (residue L)
is located outside the cell. The seven a-helices are embedded in the cellular
membrane of thebacteriumand are showninthelarge rectangles. These seven
transmembrane domains are then corinected by relatively short
extramembrane loops, three on the outside of the cell and three on the inside.
The C-terminal end of the protein (residue 248) is located inside the cell. If the
membrane is viewed as a mattress, the seven transmembrane domains
(u-helices) are the springs.
The hydrophobicity scale of Kyte and Doolittte (1982) assigns a numerical
value for hydrophobicity to each of the 20 amino acid residues.
Thble 1"8.1 shows the 20 amino acid residues (columns3,4,and 5) arranged
in order according to the Kyte-Doolittle hydrophobicity scale. The 20 KyteDoolittle hydrophobicity values in column 2 can reasonablybe clustered into
the three categories shoum in column L: seven of the 20 amino acid residues
can be categorized as hydrophobic, six as neutral, afld seven as Lrydrophilic
(water-lothg).As canbe seen, isoleucine (l) has the largest positive value in
the table and is therefore the mosthydrophobic residue according to this scale.
On the other hand, R, K D, and E are the most hydrophilic; they are electrically charged.
Hydrophobicity is not a precisely defined characteristic. Over a dozen other
different hydrophobicity scales appear in the literature. The Hopp-Woods
hydrophobicity scale (Hopp and Woods 1981) is one of the many other such
scales. These altemative scales differ considerably from one another as to the
relative numerical value assigned to the 20 amino acids. In some instances,
the differences are great enough to affect the rank order of the amino acids
and the categories into which the amino acids are most naturally clustered.
There is no consensus on which hydrophobicity scale, if any,is best suited for
this particular problem. Nonetheierr, th" Kyte-Doolittle scale and the resulting three categories are suitable for the limited purpose of discussing
how hydrophobicity relates to whether a protein segment is a transmembrane domain.
Predicition of Transmembrane Domains in proteins
i
E
r. O'( n
f.'
T]
m
l\)
a
'in
TI
tf H
5
'Irj
g'
-
=
*'''*".
> >> Fl
=- d "',."
ou'
""' -l
:
rir Fl
Figure 1"8.1 Bacteriorhodopsin protein consisting of seven transmembrane cr-helices.
Chapter L8
A d
v
U^
2.
3
tD
FI
H E
v -
a
o:.
-
o
a
fri F
-
-
l- X
{
FIJ
tt -l -l -
€
-
U) -l r.l
tt- Fl
-
n
r X
t- r
{ -
t-
-
"l - -
FJ -
- l-
-
F
r
-l
Fl -
FI1
l- E
-l € o - E
{ U) -
(t
z
U)
{
rl t- rt r
- F -
EI X -
rJ -
l- U) a
Thble L8.1 Kyte-Doolittle hydrophobicity values for the 20 amino acid residues.
Figure LB.2 shows the L61 amino acid residues of mouse peripheral myelin
protein 22. This protein is one of the 33,329 proteins appearing in releaseZT tn
late 1993 of the SWIS9PROT computenzed database of protein sequences
(Bairoch and Boeckmann 1991) and is identified in that database by the locus
narne "PM22-MOUSE". The first residue (at the N-terminal end of the protein) is methionine (M); the 151st residue (at the C-terminal end) is leucine (L).
This protein has transmembrane domains located at residues 2-31, 6*9I,
9G'II9, and 134-156. These four transmembrane domains are boxed in the
figure.
For example, the third transmembrane domain of mouse peripheral myelin
protein 22 consists of the 24 residues (boxed in figure L1.Z)betweenpositions
96 and 119:
FYITG FFQI LAG LCVMSAAAMV,
The27 residues between positions 35 and 6L (underlined and in lower case in
figure 18.2) are
TTD LWQ N CTTSALGAVO H CYSSSVSEW
and are an example of a randomly chosen non-transmembrane area of
this protein.
Category KyteDoolittle
value
One-letter
code for
amino acid
Amino acid Three-letter
code for
amino acid
Hydrophobic
Hydrophobic
Hydrophobic
Hydrophobic
Hydrophobic
Hydrophobic
Hydrophobic
Neutral
Neutral
Irleutral
Neutral
Neutral
Neutral
Hydrophilic
Hydrophilic
Hydrophilic
Hydrophilic
Hydrophilic
Hydrophilic
Hydrophilic
+4.5
+4.2
+3.8
+2.8
+2.5
+1.9
+1.8
4.4
4.7
-o.8
4.9
-L.3
-r.6
-3.2
-3.5
-3.5
-3.5
-3.5
1.9
-4.0
I
V
L
F
c
M
A
G
T
S
W
Y
P
H
O
N
E
D
K
R
lsoleucine
Valine
Leucine
Phenylalanine
Cysteine
Methionine
Alanine
Glycine
Threonine
Serine
Tryptophan
Tyrosine
Proline
Histidine
Glutamine
Asparagine
Glutamic Acid
Aspartic Acid
Lysine
Arginine
lle
Vol
Leu
Phe
cys
Met
Alo
Glv
Thr
Ser
Trp
Tyr
Pro
His
Gln
Asn
Glu
Asp
Lys
Arg
M9 Predicition of Transmembrane Domains in Proteins
LLLLGILF LH]AVLVLLF VST]VSQWLV IJF I Al r.r^ -^F ts ^ ^'l ^^-- f ru uvrwY lru L uDo.Igd-v 5 0
qhcysssvse wLQ VQATMI LSV]FSVLAL FLFFCOLFTL rxc@s 100
r_5 0
161
R HSEWHVI\TDY SYM
rrIVIqttl1.K.tl i r
Figure 18.2 Primary sequence of mouse peripheral myelin protein 22withfour transmembrane domains @oxed) and one randomly chosen non-transmembrane area (underlined).
Columns 2 and 3 of table 18.2 show the amino acid residues 9G125 from
the neighborhood containing the third transmembrane domain (located
between residues 9GII9) of mouse peripheral myelin protein 22. Tlne third
column shows the hydrophobicity category of the residue as presented in
column L of table 18.1. The fourth column shows the moving sum of the
Kyte-Doolittle hydrophobicity values for the 11 residues centered on each
residue (i.e., the given residue itself alongwith the five residues onboth sides).
The moving sum is conventionally multiplied by 10 for convenience. Note
that the moving sums shown for residues 90-94 and 121-125 are based on
residues not actually shown in this table.
As can be seen from table 18.2, the moving sum is strongly positive (indicating hydrophobicity) throughout the transmembrane domain involving
residues 96-LI9 (except for the single residue on the very boundary of the
domain). Two thirds of the 24 residues are in the hydrophobic category (containing l,V,L,F, C,M, or A). Of the remaining eight of the 24 residues, seven
residues (two Gs, two Ts, two Ys, and one S) are in the neutral category (containing G,I, S, W, Y, P) and one (the Q at position 103) is in the hydrophilic
category (containing H, Q, N, E, D, K, R).
Table 18.3 is similar to table 18.2 and shows the amino acid residues at
positions 3W1, of mouse peripheral myelin protein 22, the hydrophobicity
category of the residue, and the moving sum of the Kyte-Doolittle hydrophobicity values (multiplied by 10) for the L1 residues centered on each residue.
About half of the27 residues in positions 35{1- are neutral, about a quarter
are hydrophobic, and about a quarter are hydrophilic. As can be seen, the
moving sums are either negative (indicating hydrophilicity) or small positive
numbers. This is a very different distribution than the distribution of the24
residues in positions96-119 shown in table 18.2.
Figure 18.3 graphs the moving sum of the Kyte-Doolittle hydrophobicity
values (multiplied by L0) for the 11 residues centered on a given residue for
mouse peripheral myelin protein 22. No moving sum is computed for the
first and last five residues of the protein. The four distinct peaks on this graPh
correspond to the four highly hydrophobic transmembrane domains of this
protein. In particular, there is a peak corresponding to positions 95-LL9 of
table L8.2. The Saph also has negative or small positives values for the positions 35-61. shown in table L8.3.
450 Chapter 18
Table 18.2 Moving sums of Kyte-Doolittle hydrophobicity values for residues
90-t25 of mouse peripheral myelin protein 22.
Residue
number
Amino acid
residue
Hydrophobicity
category
Kyte-Doolittle
moving sum
90
91.
92
93
94
95
96
97
98
99
100
101
n2
103
104
105
106
107
L08
109
110
111.
1r2
113
114
115
11,6
117
118
IT9
120
12L
122
123
L24
L25
L
T
K
G
G
R
F
Y
I
T
G
F
F
o
I
L
A
G
L
c
V
M
S
A
A
A
I
Y
T
V
R
H
S
E
W
H
Hydrophobic
Neutral
Hydrophilic
Neutral
Neutral
Hydrophilic
Hydrophobic
Neutral
Hydrophobic
Neutral
Neutral
Hydrophobic
Hydrophobic
Hydrophilic
Hydrophobic
Hydrophobic
Hydrophobic
Neutral
Hydrophobic
Hydrophobic
Hydrophobic
Hydrophobic
Neutral
Hydrophobic
Hydrophobic
Hydrophobic
Hydrophobic
Neutral
Neutral
Hydrophobic
Hydrophilic
Hydrophilic
Neutral
Hydrophilic
Neutral
Hydrophilic
-12
-9
13
20
-15
-12
-22
13
17
66
108
171,
139
190
170
219
242
206
196
249
222
229
198
195
199
129
55
28
1,
-26
-76
-52
-132
-126
-154
-209
451 Predicition of Transmembrane Domains in Proteins
Thble 18.3 Moving sums of Kyte-Doolittle hydrophobicity values for residues 35*
61 of mouse peripheral myelin protern2Z.
Residue
number
Amino acid
residue
Hydrophobicity
category
Kyte-Doolittle
moving sum
35
36
37
38
39
40
41.
42
43
M
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
6I
T
T
D
L
W
o
N
C
T
T
S
A
L
G
A
V
O
H
C
Y
S
S
S
V
S
E
W
Neutral
Neutral
Hydrophilic
Hydrophobic
Neutral
Hydrophilic
Hydrophilic
Hydrophobic
Neutral
Neutral
Neutral
Hydrophobic
Hydrophobic
Neutral
Hydrophobic
Hydrophobic
Hydrophilic
Hydrophilic
Hydrophobic
Neutral
Neutral
Neutral
Neutral
Hydrophobic
Neutral
Hydrophilic
Neutral
-88
-165
-135
-108
-11L
-87
-62
-17
14
-6
45
45
48
48
42
41,
47
15
19
15
-38
-89
-L6
-L9
-52
3
-24
I8.2 THE FOUR VERSIONS OF THE TRANSMEMBRANE PROBLEM
Now suppose that we do not know about the concept of hydrophobicity or
hydrophilicity or any numerical hydrophobicity scales. The question arises
as to whether it is possible to examine a set of protein segments and then
perform some numerical calculation in order to classify a particular segment
as being a hansmembrane domain or not.
We will approach this problem in this chapter in three ways. First, we will
attempt to solve it without automatically defined functions. Second, we will
solve it with automatically defined functions. In this second version, the
automatically defined functions willbe used as detectors to create categories.
452 Chapter L8
h
I
t
A
L
H
t
R.sidue
161
Figure 18.3 The four distinct peaks in the moving sum of the Kyte-Doolittle hydrophobicity
values correspond to the four transmembrane domains of mouse peripheral myelin protein 22.
Accordingly this version is called the set-ueatingvercion. Third, we will use
automatically defined functions to perform ordinary arithmetic and conditional operations, rather than set-manipulating operations. This third version
is called the arithmetic-performing version.
All three versions of the problem described in this chapter correspond to
the first experiment described in Weiss, Cohery and Indurkhya L993. hr these
three versions, the inputs to the problem are entire pre-parsed protein segments; it is not necessary to parse the entire protein sequence. Chapter 20
discusses a version of the transmembrane problem, called the looknheadver
sion, which involves parsing the entire protein sequence.
Before proceeding, we need to discuss two additional features of genetic
programming. Section 18.3 discusses the idea of settable variables, memory,
and state, ffid section 18.4 discusses restricted iteration.
1,8.3 THE IDEA OF SETTABLE VARIABLES, MEMORY AND STATE
Mathematical calculations in computer programs typically employ
settable variables, memory, and state (Genetic Programming, sections 18.2
and L9.7).
Settable variables, such as M0, M1-,M2, and M3, can provide memory (state)
in a computer program. At the beginning of the execution of a program, each
settable aariable is initialized to some initial value appropriate to the problem
domain (e.9.,0).
The settable variables then typically acquire other values as a result of the
side-effecting action of various setting functions. Specifically, the one-argument setting function, SETMO, can be used to set M0 to a particular value.
Similarly, the setting functions SETM1, SETM2, and snru3 can be used to set
the value of the settable variables MI,NI2, and M3, respectively.
453 Predicition of Transmembrane Domains in Proteins
454
Memory can be written (i.e., the state can be set) with the setting functions
SETMO, SETM1, SETM2, and sutlt3. Memory can be read (i.e., the state can be
interrogated) by merely referring to the terminals M0, M1, trl2, and tvt3.
We anticipate that such settable variables will be useful in the mathematical calculation required to solve the transmembrane problem. Since we do
not know how many such variables are necessary, we simply make a seemi.gly excessive number (e.g., four) of settable variables available and allow
the evolutionary process to ignore them or to evolve a way to use them.
Teller (1993,1994a) has extended to idea of memory to indexed memory as
described in subsection F.14.1 in Appendix F. Andre (L994b) has applied index memory to evolve a mental model (subsection F.14.2).
18.4 THE IDEA OF RESTRICTED ITERAIION
Typical computer programs contain iterative operators, which perform some
specified work until some condition expressed by a termination predicate is
satisfied. Genetic programming is capable of evolving programs with iterative operators. For example, the iterative DU operator (Do Until) used in the
block stacking problem (Genetic Programming, sectton 18.1) is a two-argument
operator that iteratively performs the work specified by its first argument
until the termination predicate specified by its second argument is satisfied.
In the block stacking problem, the DU operator (Do Until) was permitted to
aPpear in a program without restriction as to the number of its occurrences
within the overall program and without restriction as to its locations within
the program. The SIGMA operator for iterative sufiunation was similarly
unrestrict ed (G en etic P r o gr ammin g, section 18.2) .
Of course, in a genetically evolved program both the work and the terminationpredicate of each occurrence of aniterative operator are initially created
at random. Both are subsequently subjected to modification by the crossover
operation. Consequently, iterative operators will, at best, be nested and consurne enorrnous amounts of computer time or will, at worst, have unsatisfiable
termination predicates and go into infinite loops.
One way to avoid these pitfalls is to impose time-out limits on each iterative loop individually and on all iterative loops cumulatively. These necessary limits are someiwhat arbitrary. Even when such time-out limits are
imposed, programs containing iterative operators are still extremely timeconsuming. The worst performing and least interesting programs in the population usually consurne the most computer time.
In problems where we c€u:r envisage one iterative calculation being usefully performed over a particular known, finite set, there is an attractive alternative to imposing arbitrary time-out limits. For such problems, the iteration
can be restricted to exactly one iteration (or a specified number of iterations)
over the finite set. [r lhis restricted iteration (poor man's iteration), the termination predicate is fixed, guaranteed to be higgered in a definite amount of
time, and is not subject to evolutionary modification. No nested iterations or
Chapter 18
infinite loops are possible. The amount of computer time is capped and knowable from the usual factors (i.e., population size, number of generations, size
of the programs in the populatiory number of fitness cases, nature of the fitness measure, and the nafure of the problem).
Inthe case of certainproblems involvingthe examination of the residues of
a protein, iteration can reasonably be limited to the ordered set of amino acid
residues of the protein sequence or protein segment involved. Thus, for this
problem, there can be one iteration-performing brandr, with the iteration
restricted to the ordered set of amino acid residues in the protein segment.
Each time iterative work is performed, the pointer identifying the current
residue of the protein is advanced to the next residue of the protein segment
until the end of the entire protein segment is encountered. An analogy is the
repeated pressing of the space bar of a typewriter. Each time the space bar is
pressed, the typing head moves one space to the right. Howeveq, repeated
depressing of the space bar cannot itself move the typing head beyond the
end of the typewriter carriage or cause the typing head to retum to the far
left. \A/hen the iteratin-terminating branch is finished, the result-producing
branch produces the final output of the overall program.
M*y iterative calculations work in conjunction with memory (state). Typically the work varies depending on the current value of the iteration variable
(index) and the current contents of the memory. The memory transmits information from one execution of the iterative calculation to the next. br this problem the same work is executed as many times as there are residues in a protein
segment, so the iteration variable here is the amino acid residue at the current
position in the protein segment. Depending on the problem, the iteration variable may be explicitly available or be implicit$ available through functions
that permit it to be interrogated. For this problem, there will be no need for
the iteration variable to be explicitly available in the terminal set.
Irr this problem, each settable variable is initialized to zero atthe begiruring
of the execution of the iteration-performing branch. The settable variables
then typically acquire some final value as a result of the work performed by
the iteration. We make four settable variables, M0, ML,M2,and yt3 available to
the iterative calculation of this problem.
The following code employing the LooP macro of Common LISP (Steele,
1990) precisely specifies the operation of restricted iteration for one protein
segment (fihress case) for this problem.
1 (loop initially (progn (setf M0 0.0) (setf Ml 0.0)
2 lsetf M20.0) (setf M30.0))
3 for residue-index from 0 below (tength protein-segiment)
4 for resi-due = (aref protein-segnnent residue-index)
5 do (eval TPBO)
6 finally (return (wrapper (eval RpB) ) ))
Irr lines 1 and 2,the settable variables, Mo, M1, tut2, artdM3, are each set to an
initial value of 0.
Predicition of tansmembrane Domains in Proteins
456
Line 3 specifies that the indexing variable, res idue- index, will start at 0
and run up to one less than the lenqth of the array (vector) proteinsegment.
Irr line 4, the affay protein-segment is referenced with the array-referencing function, aref , to extract the element (the amino acid residue) identified by the indexing variable residue-index. The variable residue is
bound to the extracted value. This binding enables the yet-to-be-evolved program to detect whether the current residue is a particular amino acid.
Lr line 5, the iteration-performingbranch, rPBO , is evaluated using eval
successively for each residue in the protein-segment. The iterationperformingbranch, rPBO, would typically contain references to the settable
variables MO, M1, NI2, and u3 and the automatically defined functions ADFO/
ADFI-, and eor'2 (if they are involved).
In line 6, the result-producing brancku RPB, is evaluated using eval after
rpBO has been invoked for the last time (i.e., on the last residue of the prot e i n - s eqmen t ). The result-producing branch, RPB, typically contains references to the settable variables M O, M 1, M2, artdM3 . The r e t urn in the f i na I ly
clause causes the result of evaluating the wrapperized value of Rpe to be
retumed as the overall result of the program's execution for the current fitness case. The wrapperved value of Rpe is the classification of the proteinsegment as a transmembrane domain or a non-transmembrane area.
hr the genetically evolved programs presented later the code on lines 1-5 is
called looping-over-residues; the result-producing branch (line 6)
appears below its usual values function.
18.s PREPARATORY STEPS WITHOUT ADFs
The yet-to-be-evolved program without automatically defined functions for
predicting whether a given protein segment is a transmembrane domain
should be capable of performing three tasks. First, it should be able to
interrogate the residues and perform some calculation (e.9., grouping them
into useful categories). Second, it should be able to iteratively perform
some yet-to-be-determined arithmetic calculations and conditional operations involving the as-yet-undiscovered categorizations. Third, it should
be able to perform some final yet-to-be-determined arithmetic calculations
and conditional operations to reach a decision using the intermediate
results produced by the as-yet-undiscovered iteration. A predicting program without automatically defined functions might perform the first two
tasks in an iteration-performing branch and it might perform the final
task in a final branch. Even though automatically defined functions are
not involved in the discussion in this section, this final branch can nonetheless be aptly called a result-producing branch.
Figure 18.4 shows the structure of a two-branch predicting progam without automatically defined functions consisting of an iteration-performing
branch, rPBO, and a result-producingbranch, RPB.
Chapter 18
known-finite-set
Body of Iteration
Performing Branch
Body of ResultFigure 18.4 Overall two-branch program consisting of an iteration-performing branch, r pBO,
and a result-producing branch, RPB, for the subset-creating version of the transmembrane problemwithoutADFs.
L8.5.L Terminal Set and Function Set
We now consider the terminal set and function set for each branch of the
overall two-branch predicting program for the transmembrane problem without automatically defined functions.
Aprogram for creating categories of amino acids and taking different actions
based on the category to which a particular residue belongs must be able to
determine what residue is at a certain position in the protein segment. In
additiory such a program must be able to form categories based on the outcome of the interrogation.
Since we anticipate that rurnerical calculations will subsequently be performed on the presence or absence of a particular residue at a particular position in the protein segment, it seems reasonable to employ numerically-valued
logic refurning numerical values such as -L and +1, rather than Booleanvalued logic retuming values such as T or NrL. Numerically-valued logic
permits the results of the residue-detecting operations to be freely combined
with arithmetic operations and numerical constants into more complicated
calculations.
One way to implement this approach is to define 20 numerically-valued
zero-atg.Tment functions for determining whether the current residue in a
protein segment is a particular amino acid. For example, (A? ) is the zeroargurnent residue-detecting function returning a numerical +1 if the current
residue is alanine (A) but otherwise retuming a numerical -1. A similar residue-detecting function is defined for each of the L9 other amino acids. Since
these 20 functions take no arguments, they are considered terminals in accordance with our usual convention in this book.
The length of the current protein segment, LEN, is a potentially useful terminal in the contemplated calculations. The settable variables, MO, MI, M2,
and M3, provide memory (state) for the contemplated iterations. The random
457 Predicition of tansmembrane Domains in proteins
458
constants, 9tbigg"t-r=4r, range between -10.000 and +10.000 (with a granularity
of 0.001).
Thus, the terminal set, tipb7, for the iteration-performing branch, rpBO,
contains the 20 zero-argurnent numerically-valued residue-detecting functions, the constant terminal lnN, the settable variables M0, M1-,M2, and M3,
and the random constants, 9lbigg.r-rea1s. That is,
ttpuo= { (A? ) ,
(C? ) , ..., (y? ), LEN, M0, Ml-, M2, M3, frbigger-reals}.
Since we envisage that sets of amino acids willbe formed into categories, it
seems potentially helpful to include the logical disjunctive function in the
ftrnction set. Specifically, oRN is the two-argument numerically-valued disjunctive function (on) that retums +1 if either or both of its arguments are
positive, but returns -1 otherwise. For example, (onN (A? ) (C? ) ) returns
+L if the current residue is either alanine (A) or cysteine (C), but retums -L if
the current residue is any of the other L8 amino acids.
Since we envisage that the iteration-performing branch will perform calculations and make decisions based on these calculations, it seems reasonable
to include the four arithmetic operations and a conditional operator in the
function set. We have used the four arithmetic functions (*, *, x, and %) for
performing arithmetic calculations and the conditional comparative operator rFLTE for making decisions on many previous problems, so we include
them in the function set for the iteration-performing branch. Since there are
side effecting functions in this problem (i.e., the four setting functions), rrlrn
mustbe implemented as a macro as described in section 12.2.
The one-argument setting functions. SETMO, SETM1, SETM2, and Sntu3,
can be used to set the values of the settable variables, MO, MI, M2, and M3,
respectively.
Thus, the function set, fipb7, for the iteration-performingbranctU IPB0, is
fipUO- {ORN, SETMO, SETM1, SETM2, SETM3, IFLTE, *, -, *, ?}
with an argument map of
{2,1,1, 1, 1, 4,2,2,2,21.
Once a program has memory (state) in the form of settable variables and
setting ftrnctions, the ability to do arithmetic, and the ability to conditionally
perform altemative calculations based on the outcome of a conditional test,
many different mathematical computations can be performed. These calculations include averages and weighted averages of the number of occurrences
of amino acidsbelonging to a particular dyramically defined subsetof amino
acids.
The result-producing branch can then perfofin a non-iterative floating-point
calculation and produce the final result of the overall program. The settable
variables/ MO, M1/ tit2, andM3, provide the way to communicate the results of
the iteration-performing brandr to the result-producing branch.
The terminal set, trpb,for the result-producingbranch, RPB, is
t pb= {lnN, MO, M1, M2, M3,frbigger-r"dr}.
Chapter 18
The function set, frpb, for the result-producing branch, RPB, is
frpb= {rrltn, *, -, *, %}
with an argulnent map of
{4,2,2,2,2}.
A wrapper is used to convert the floating-point value produced by the
result-producing branch into a binary outcome. If the genetically evolved
program retums a positive value, the segment will be classified as a transmembrane domain, but otherwise it will be classified as a non-transmembrane area.
Even though automatically defined ftrnctions are not yet involved, the overall program here contains two branches. These branches have different terminal sets and function sets. Thus, strucfure-preserving crossover is needed to
preserve the constrained slmtactic structure used in this problem. In implementing structure-preserving crossovel, separate types are assigned to the
two branches (i.e., branch Vpirg is used).
Lr summary, when genetic programming without automatically defined
functions is applied to the transmembrane problem, each individual overall
two$ranch program in the population consists of an iteration-performing
branch, rPBO , employing four memory cells and a result-producing branch,
RPB, employing the results of the iteration to produce a signed number that
signifies whether the given protein segment is a transmembrane domain.
L8.5.2 Correlation as the Fifiress Measure
The fifiress cases for this problem consist of protein segments extracted from
a sample of proteins. Fihress will measure how well a genetically evolved
program predicts whether the segment is a transmembrane domain.
When a genetically evolved program in the population is tested against a
particular fitress case, the outcome can be
' a true-positive (i.e., the program correctly predicts that the given segment
is a transmembrane domain when the segment is, in fac! transmembrane),
' a true-negative (i.e., the program correct$ predicts that the given segment
is not a transmembrane domain when the segment is, in fact, not transmembrane),
' a false-positive (i.e., the program ouerpredicfs that the given segment is a
transmembrane domain when the segment is, in fact, not transmembrane), or
' a false-negative (i.e., the program underpredicts that the grven segment is
not a transmembrane domain when the segment is, in fact, transmembrane).
The sum of the number of true positives (Nrpl th" number of true negatives
(Nn), the number of false positives (Nfu), and the number of false negatives
(Np) equals the total number of fitness cases, N1r:
N,o = N,o*N,n*N*+N*
459 Predicition of Transmembrane Domains in Proteins
460
The perfonnance of a predicting algorithm canbe measnred in several ways
using Ntp, Ntn, Nfp,Nfn, and N1r.
Orre frequently used way of measuring the performance of a predicting
program is to measure its accuracy. The accuracy measure, Q3, is the number
of fitness cases for which the predicting program is correct divided by the
total number of fitness cases. That is,
A N,o + N,n
V1 -
-J
N,,
Avalue of accuracy of 1.0 is bes| 0.0 is worst. The accuracy measure, Q3, does
not consider the amount of overprediction representedby N1por the amount
of undeqprediction represented by Nyn.Qz is a somewhat specious measure
because the significelnce of a particular reported value of QZ is highly dependent on the frequency of appearance of the characteristic being studied. For
example, if 53% of the examples in a three-way classification problem belong
to one class, then a predicting pro$am thatblindly classifies every example
as belonging to that class will achieve an accur aI Qz of 53/o (Stolorz, Lapedes,
and Xia 1992). Similarly, if only 95% of the fitness cases are examples of a
characteristic, then a predicting program that blindly classifies every example
as positive will achieve an accuracy Qt ot 95/" (Matthews 1975).
The error rate E3, is the number of fihress cases for which the predicting
program is incorrect divided by the total number of fitress cases (Weiss, Cohery
and Indurkhya 1993). That is,
r N*+N*
L, ,N* = -I- Qt.
This frequently used measure suffers from the same deficiencies as the accuracy measure Q3.
Another way to measure the perforrnau:rce of a predicting program is to
measure the percentage of agreement, co,between a program's prediction and
the observed reality (Matthews 1975):
100N,
v-
-
" N,p + N,,
Avalue oI coof 100% is bes! 0% is worst. Howeveq, coalone is also a somewhat specious measure, since it does not take into account the amount of
overprediction (i.e., false positives). For example, a program that always makes
a positive prediction would achieve a value of 100% for co.
The inadequacy of co canbe counterbalanced by combining it with a measure of overpredictiory cna.h their early work on predicting the secondary
structure of proteins, Chou and Fasman (1974b) used Qw a measure that
combines co with a measure of overpredictiory cno. Specifically, cno is the
percentage of negative cases that the program correctly predicts to be
negative cases:
Chapter 18
100N,
na Nr+N
coand cnaare then averaged to yield
f) -co*cno
Ya- )
p
The Qameasure gives an overall estimate of the agreement between the predictions and the observed reality. If a predicting technique is accurate, csr cn^er
xd Qa will be 100%. A predicting Proglam that always makes a positive
prediction would have a large amount of ovelprediction (i.e., a large value of
Nyp) and would achieve a relatively low value of cno and a relatively low
averuge Qu.
Each of the above performance measures is a potential candidate for ahtness measure for genetic programming; howeve{, each has shortcomings.
Matthews (1975) points out that the correlation between the prediction and
the observed reality is a more general measure that avoids the shortcomings
of each of the above measures. As it happens, the calculation of correlation is
considerably simplified when the predictions and observations each take on
only two possible values.
Let Pirepresent the prediction for fitness case j (i.e., P7 is the output of a
genetically evolved program; it is 1 if a protein segmentT is predicted to be
transmembrane and is 0 if the segment is predicted to be non-transmembrane).
Let 57 represent the observed structure for fihress case/ (i.e., 57 is 1 if protein
segment 7 is observed to be transmembrane and is 0 if the segment is nontransmembrane).
The correlation C between the prediction P1 and the observation Si is, in
general, givenby
,_
I,(s,-sX",-")
,/>,(r, -s)'I, P,-p)'
(Fisher L9l8;MatthewsT971),where F and S are the mean values of Pier:td
57 respectively, and the summations are over all Np fitness cases.
As Matthews (L975) points out, for the special case where Pn and Sn are
step functions taking only the values of 0 or L, the correlation Cbecomes
Fs(t-s)(r-n)
Here S is the fraction of the fibress cases that are observed to be transmembrane; that is,
o- N,, +Nn
r) - - .
N,,
46t Predicition of Transmembrane Domains in Proteins
P is the fraction of the fifiress cases that are predicted to be transmembrane,
D -
l -
N,o+N*
N,,
As Matthews (1975) observes, the correlation coefficient indicates how mudr
better a particular predictor is than a random predictor. A correlation C of
+1.0 indicates perfect agreement between a predictor and the observed reality; a correlation C of -1,.0 indicates total disagreemen! a correlation C of 0.0
indicates that the predictor is no better than random.
Aspreviouslymentioned insection 16.2, this formula for correlationin this
problem, where the predictions and observations of a classification problem
take on only two possible values, is equivalent to a calculation of the cosine of
the angle in a space of dimensionality Ny, between the zero-mean vector of
Iutgth Nyc of correct €iltswers and the zero-mean vector of length Nyc of predictions. A correlation C of *L.0 indicates vectors pointing in opposite directions inN;r-space; a correlationof +1.0 indicates coincidentvectors; a correlation
of 0.0 indicates orthogonal vectors. For a two-way classification problem, correlation can also be computed as
r _
N,pN,,- NtuNtu
{(t" +NnXt, +NnXt- +Nn)(tr+Nrr)
Note that C is set to 0 when the denominator is 0.
Accordin gly, C lends itself immediately to being the measure of raw fibress
for a genetically evolved computer program. Since raw fihress ranges between
-1.0 and +1.0 (higher values being better), standardized fibress can then be
defined as
Standardized fibress ranges between 0.0 and +1.0,lower values being better
and 0 being best. A standardized fitness of 0 indicates perfect agreement
between the predicting program and the observed reality (the correct answer);
+1.0 indicates total disagreement; and 0.50 indicates that the predictor is no
better than random.
18.5.3 Fitness Cases
Release 25 ofthe S\MSS-PROT protein data bank contains 248 mouse proteins with transmembrane domains identified in their SWISS-PROT feature
tables. These proteins average 499.8 amino acid residues in length. Each such
protein contains between one and 12 transmembrane domains, the average
being 2.4. The transmembrane domains range in length from L5 to lOL residues, with an average of 23.0.
Of these 248 proteins, 123 are randomly chosen to create the in-sample set
of fifiress cases to measure fitress during the evolutionary process. One of the
Chapter 18
t- c
2
462
transmembrane domains of each of these L23 proteins is chosen at random as
a positive fihress case for the in-sample set. Then, one equally long segment
that is not contained in any of the protein's transmembrane domains is randomly chosen from each protein as a negative fihress case. As a result, there
are!23 positive and 123 negative fibress cases in the in-sample set of fihress
cases.
Table 18.4 shows the 246 rn-sample fitness cases. The first column names
the proteur; the second column glves the length of the protein; and the third
column shows the number of transmembrane domains in the protein. The
fourth and fifth columns apply to the particular randomly chosen transmembrane domain (positive fibress case). The sixth and seventh columns apply to
the one randomly chosen non-transmembrane area (negative fibress case) of
the protein. For example, row one shows that the L9-residue transmembrane
domain located at positions 287105 (one of two transmembrane domains in
the protein) and that the 19-residue non-transmembrane area located at positions 330-348 is chosen from the 3BH1-MOUSE protein.
Genetic programming is driven by fitness as measured by the set of
in-sample fihress cases. Howeveq, the true measure of performance for a recogruzing program is how well it generalizes to different cases from the same
problem environment. 250 out-of-sample fitness cases (125 positive and 125
negative) are then created from the remaining L25 proteins in the same metnner as that described above. These out-of-sample fitress cases are then used
tovalidate the performance of the genetically evolved predicting programs.
Table 18.5 shows the 250 out-of-sample fitness cases.
An auxiliary hits measure has usually proved useful for extemally monitoring runs of genetic programming. Since the hits measure is not used by
genetic programming, it seemed most useful to base the definition of hits on
the perforrnance of the predicting program on the out-of-sample fibress cases.
Therefore, hits is defined as the nearest integer to L00 x (1.0 - standardized
fitness) for the out-of-sample set. A genetically evolved program with an outof-sample correlation C of 1.00 will score 100 hits. Since only the best-of-generation programs (identified using in-sample fitress) are tested against the
out-of-sample fitness cases, the hits measure is only computed for the best-ofgeneration programs.
Even with iteration restricted to a single loop in the iteration-performing
branch, this problem proved extremely time-consuming. Moreove4 on our
first four runs of this problem we initially used an environment consisting of
fewer in-sample fitness cases than described above and discovered that there
was an undesirably large divergence in the values of the in-sample correlation and out-of-sample correlation. Lrcreasing the number of in-sample fitness to the full number described above, of course, aggravated the problem
of computer time. Therefore, we compromised on the maximum number of
generations to be run and set G to 21.
Since we had no idea in advance what values of correlation to expect on
this problem, we ran this problem with no success predicate so that all runs
would continue for the fuIl21 generations. We then examined the values of
463 Predicition of Tiansmembrane Domains in Proteins
Thble 18.4 hr-sample fitness cases.
3BH1_MOUSE I EZZ 2
2
4
7
7
1.
1
4
4
7
7
7
1
1
7
10
1
1
1
1.
1
1
1
1
T
1
1
1
12
6
6
2
t
4
4
4
4
2
1
1
1
1
1
J
I
4
19
19
20
25
24
24
17
19
19
23
21,
2L
28
26
23
24
24
24
23
29
24
27
27
21
21.
24
30
37
21
20
22
17
27
21
21,
21,
21.
23
T7
17
79
2l
26
26
24
21.
22
287105 | 19 330-348
330-348
38H04
235-259
277100
736-759
625-{41
391.412
381-404
2n199
168-188
225-245
7-34
55-80
363-386
829-846
24p165
79ffi21
736-758
135-163
41.8-41.
r3u1,64
43-69
6-26
12G1,46
1U-165
434-463
7+-L10
593413
L9+-21,4
21,5-239
138-154
995-1021,
110-130
1.6G186
119-1,40
31.4-3U
5-29
114-136
249165
94-L12
56-76
271,-296
93-118
31.-61.
348-368
39H19
3BH3_MOUSE I gZZ 287-305 | 19
sHT3-MOUSE I +AZ 46HU I ZO
sHTE_MOUSE | 366 2448 25
A2AB_MOUSE J 455 411434 I 24
A4_MOUSE 770 70u723 | Z+
ACE-MOUSE I rgrZ 1265-1281. | 17
ACHB_MOUSE | 501 277-295 | 22
ACHE_MOUSE I 493 273+91, | 24
ACMI_MOUSE I 460 25_47 23
AG2S_MOUSE I gSg 276-296 | zt
ANPA-MOUSE I 1.057 470490 | 21
ATNC_MOUSE I 290 28
AVRB_MOUSE | 536 135-160 | 26
B2AR_MOUSE I +rA 107-129 | 24
B3AT-MOUSE I gZg 424-447 | 18
BASI-MOUSE I ZZg 21c-233 | 24
CADE-MOUSE I 884 710-733 | 24
CADP-MOUSE I 822 64H70 | 23
CD11_MOUSE I gs' 29U326 | 29
CD19_MOUSE I S+Z 288-311 | 24
CD3D_MOUSE I r73 10r-127 | 27
CD3G-MOUSE I 782 112-1.38 | 27
CD3Z-MOUSE I 164 31-51 21
CD44_MOUSE | 363 271,-291, | 21
CD4L-MOUSE I ZOO 23-4!6 | 24
CDs_MOUSE 494 37240t | 30
CDSA-MOUSE I 247 1,8+-220 | 37
CFTR_MOUSE | 1476 1.009-1029 | 21,
CIKI_MOUSE I 495 290-309 | 21,
CrKD_MOUSE I s11 345-366 | 22
COg_MOUSE 528 292108 | 17
CR2_MOUSE I 1025 964-990 | 27
CX26_MOUSE | 226 190-210 | 21
CX32_MOUSE | 283 1,89109 | 21,
CX4O_MOUSE I gSZ 205-225 | 22
CX45_MOUSE I Sg0 189-2W | 2L
D3DR_MOUSE | 46 33-55 25
EPOR-MOUSE I SOZ 135-151 | 23
FASA_MOUSE I EZZ 17U186 | 17
FCEA_MOUSE | 250 205-223 | 19
FCEG_MOUSE I EO 24-44 2L
FCGo-MOUSE I gzg 211-236 | 26
FCGX_MOUSE I 283 211-236 | 26
FLAP_MOUSE I 153 5-28 31
FURr_MOUSE | 7e3 715-735 | 21,
GAA3 MOUSE I 492
Chapter 18
338-359 | 22
Protein Length Number of
transmembrane
domains
Length of
chosen
transmembrane
domain
Location of
the chosen
transmembrane
domain
Length of
chosen
non-transmembrane
segment
Chosen
nontransmembrane area
brane area
GAC2_MOUSE
GAD-MOUSE
GClM_MOUSE
GCAM-MOUSE
GGNT-MOUSE
GLP-MOUSE
GLRB-MOUSE
GRPR_MOUSE
GTR2-MOUSE
HAlO_MOUSE
HA12_MOUSE
HA14_MOUSE
HA17_MOUSE
HAlB_MOUSE
HAlK-MOUSE
HAlQ_MOUSE
HAlU-MOUSE
HA21-MOUSE
HA23*MOUSE
HA2D_MOUSE
HA2I_MOUSE
HA2Q_MOUSE
HA2S_MOUSE
HAMl-MOUSE
HB22-MOUSE
HB24_MOUSE
HB2D-MOUSE
HB2I-MOUSE
HB2K-MOUSE
HB2S_MOUSE
ICAl_MOUSE
ILlS-MOUSE
IL2B-MOUSE
IL5R-MOUSE
ILTR-MOUSE
INGR_MOUSE
IP3R-MOUSE
ITA5-MOUSE
ITAM_MOUSE
ITB2_MOUSE
KFMS-MOUSE
KKIT-MOUSE
KMET_MOUSE
LEMl_MOUSE
LEM3-MOUSE
LMA_MOUSE
LMP2_MOUSE
LSHR*MOUSE
474
M9
393
399
M7
168
883
3U
523
322
365
368
334
369
369
328
361.
255
229
256
254
221,
233
577
264
2&
265
264
263
263
3J/
410
539
415
459
477
2749
409
1153
770
976
975
L379
372
768
3084
415
700
4
4
1
1
1
1
4
7
t2
1
1,
1
1
1
1
t
1
1
1
1
1
1
1
6
1
7
1
1.
t
1
1
1
1
1
1
1
8
1
1
1
1
1
1
1
1
1
1
7
23
23
18
18
23
23
20
21
21,
15
23
27
22
23
23
24
20
26
26
26
26
26
26
22
23
32
2L
23
21
21,
24
25
28
22
25
24
L7
26
24
23
25
23
23
23
24
77
25
28
299-32L
275-297
34U357
346-363
7+9
1,09-131.
54G'65
266-286
43H53
30u322
312-334
30rt-330
31L-332
30G328
306_328
266-289
315-334
217-242
191--216
219144
217-242
184-209
t9G?21
16-37
226-248
217-248
227-247
226-248
225145
225-245
486-509
35G381
241-268
340161
24U264
25+-2n
239L1407
35G381
1106-1129
702-124
512436
520-542
932-954
33F355
710-733
2337-2353
38H04
363-390
24
23
18
18
23
23
19
21
21
15
23
27
22
23
23
24
20
26
26
26
26
26
26
20
23
32
21
23
2'1.
21.
24
26
28
22
25
24
17
26
24
23
25
23
23
23
24
17
25
24
125-1.48
11+-L36
367-3U
37T390
228-250
139-1.61.
85H68
346-365
260-280
t47-1.61.
145-167
337-363
145-1,66
338-360
338-360
29V321
1.48-1.67
96-121,
83_108
97-122
96-121,
80-105
8G111
196-2L5
102-124
9T124
104-124
102-124
103-123
103-123
232-255
384409
391.4L8
160-181
350-374
t16-139
2662-2678
't66-191
542-565
340-362
745-769
249-271
45H77
156-178
34+-367
11,6t-1177
178-202
552-575
Predicition of Transmembrane Domains in Proteins
466
Protein
MAGL-MOUSE
MAN2_MOUSE
MB1-MOUSE
MDR2_MOUSE
MEPA_MOUSE
MPVl_MOUSE
MYPO_MOUSE
NK13_MOUSE
NKlR-MOUSE
NTTG-MOUSE
OPSD_MOUSE
PGDR_MOUSE
PGHS-MOUSE
PLR2_MOUSE
rfTPU_MOUSE
RNG6_MOUSE
SYND-MOUSE
TCBl_MOUSE
TCCl_MOUSE
TCC3-MOUSE
TEA_MOUSE
THRR_MOUSE
TNRl_MOUSE
TRBM_MOUSE
TRKB-MOUSE
TYRO-MOUSE
UDPl_MOUSE
VATL_MOUSE
NK13-MOUSE
NKTR-MOUSE
NTTG_MOUSE
OPSD_MOUSE
PGDR_MOUSE
PGHS_MOUSE
PLR2_MOUSE
PTPU_MOUSE
RNG6_MOUSE
SYND_MOUSE
TCBl_MOUSE
TCCl_MOUSE
TCC3_MOUSE
TEA-MOUSE
THRR-MOUSE
TNRl-MOUSE
TRBM-MOUSE
TRKB_MOUSE
TYRO_MOUSE
UDPl_MOUSE
VATL-MOUSE
Chapter 18
249-268
579-599
179-201.
518-537
35U377
117-L37
65-90
132-155
4-28
58ffi05
512
25+|78
448463
262-285
t09vr119
105-126
283107
5H4
58-78
59-79
246-266
39441.6
334-356
248-271.
62ffi49
226-249
513-529
34-5s
732-155
+-28
58ffi05
5-32
25+-278
4J.H.63
262185
1098-1119
105-126
283-307
50-84
58-78
59-79
246-266
394.416
33+-356
24U271.
62ffi49
?26-249
51T529
34-55
637
1150
220
1276
760
176
248
223
407
633
348
1098
602
292
1.452
261
311
173
167
169
453
430
454
J//
821.
533
530
155
223
407
633
348
1098
602
292
1.452
261
311
173
1.57
169
453
430
4il
577
821.
533
530
155
111
1212117
12n111111111771
1.
1, 114
1,
T27111111
1. 117
,7111114
20
21.
23
42
28
21
26
24
23
18
25
25
16
24
22
22
25
35
21
21.
25
?n
23
24
24
24
17
26
24
23
18
25
25
1.6
24
22
22
25
35
27
21,
25
20
23
24
24
24
17
26
517-536
6-26
I37-L59
191-232
727-754
94-1L4
15+-179
4043
32-54
295312
3741.
531-555
293108
23U253
74T764
231-252
253-2n
133-1.67
135-155
137-157
324-3M
27+-293
2rT235
518-541
$4453
47M97
494-510
r27-1.52
4M3
32-54
295-312
37-41
531-555
293108
23A-253
743-764
231-252
25T277
t33-L67
135-155
137-L57
320144
274-293
213-235
518-541
430453
474497
49+-510
127-r52
20
21
23
20
28
2L
26
24
25
20
28
25
16
24
22
22
25
35
2L
21
21,
23
23
24
24
24
17
22
24
25
20
28
25
76
24
22
22
25
35
21,
2't
21.
23
23
24
24
24
17
72
Table L8.5 Out-of-sample fitness cases.
Protein Length Number of
transmembrane
domains
Length of
chosen
transmembrane
domain
Chosen
transmembrane
domain
Length of
chosen
non-transmembrane
segment
Chosen
nontransmembrane area
3BH2_MOUSE
4F2-MOUSE
5HTB*MOUSE
A2AA-MOUSE
A2AC-MOUSE
ACET-MOUSE
ACHA_MOUSE
ACHD_MOUSE
ACHG_MOUSE
AG2R-MOUSE
AMPE_MOUSE
ATNB_MOUSE
AVR2_MOUSE
829_MOUSE
83AR_MOUSE
83LP_MOUSE
C114-MOUSE
CADN-MOUSE
CAML-MOUSE
CD12-MOUSE
CD2-MOUSE
CD3E-MOUSE
CD3H-MOUSE
CD4O_MOUSE
CD4s_MOUSE
CD4_MOUSE
CD72-MOUSE
CDSB_MOUSE
CIKO_MOUSE
CIK3-MOUSE
CNCG_MOUSE
COX2-MOUSE
CTL4-MOUSE
CX31_MOUSE
CX37-MOUSE
CX43-MOUSE
CX5O_MOUSE
DTCM_MOUSE
EVI2_MOUSE
FCE2-MOUSE
FCEB-MOUSE
FCGl_MOUSE
FCG3_MOUSE
FGRl_MOUSE
FLK2_MOUSE
GAA2-MOUSE
GAA6_MOUSE
265
526
385
450
458
732
457
520
519
359
945
304
513
228
388
1237
573
906
1260
336
344
189
206
305
1152
457
354
213
129
530
683
227
223
270
332
381
439
291,
223
331
235
404
261.
822
992
451
443
t
1
7
7
7
1
4
4
4
7
1
1
1
I
7
10
1
1
1,
1.
t
1
1
1
a
1
I
1
1
6
6
2
1
4
4
4
4
1
I
1
4
1
1
1,
1,
n.+
4
19
24
22
25
26
17
20
25
25
22
23
28
26
22
21.
22
28
22
23
29
26
26
21.
22
22
23
21
31,
23
19
19
?2
26
21
21,
21,
21
18
26
25
20
23
20
21
20
22
22
180-198
7G99
311-332
r07-131.
89-11.4
685-70r
297116
249-273
241-265
193114
18-40
35-62
L3G167
159-180
32+-3M
822-843
481-508
725-746
1t2+-1146
298126
20+-229
109-\34
31-51
194-215
426447
395417
96-116
168-198
44--66
31.6134
190-208
2748
1.62-787
186+06
78-98
76-96
761-181
263-280
12GI5l
2449
90-109
298120
216135
377197
545-564
31T334
233-254
19
24
20
26
24
17
19
25
22
23
23
28
26
22
23
23
28
22
23
29
26
26
21
22
22
^ a
ZJ
21.
37
23
22
27
20
26
21.
21,
2L
21
18
26
26
20
20
21
20
22
22
223-241.
302125
143-162
5-30
294-317
709-725
107-125
113-137
498-519
3-25
482-504
170-197
56-81
t9+215
131-153
1059-1081
528-555
352-373
1193-1215
135-163
275100
150-175
119-r39
250-27L
203-224
427449
38-58
69-99
11-33
84-105
330-350
146*1.65
193-218
108-128
270+90
181,-201
119-139
123-140
175100
178103
20+-223
138-160
99-1r8
179-199
263-282
367188
352173
cx37
CX43
CX5O
FCGl_MOUSE
FCG3_MOUSF
FGRl_MOUSE
FLK2_MOUSE
GAA2-MOUS]
CAA6 MOUS]
Predicition of Tiansmembrane Domains in Proteins
GAC3-MOUSE
GATR_MOUSE
GC3M_MOUSE
GCBM_MOUSE
GHRH-MOUSE
GLRA_MOUSE
GPTO_MOUSE
GTRl_MOUSE
GTR4-MOUSE
HA11_MOUSE
HA13-MOUSE
HA1s_MOUSE
HA18-MOUSE
HAlD_MOUSE
HAlL_MOUSE
HAlT-MOUSE
HAlW_MOUSE
H422_MOUSE
HA2B-MOUSE
HA2F_MOUSE
HA2K_MOUSE
HA2R_MOUSE
HA2U-MOUSE
HB21_MOUSE
HB23_MOUSE
HB2A_MOUSE
HB2F-MOUSE
HB2J_MOUSE
HB2Q_MOUSE
HB2U_MOUSE
ILlR-MOUSE
IL2A_MOUSE
IL4R*MOUSE
IL6R_MOUSE
IL9R-MOUSE
INSR_MOUSE
NA4_MOUSE
ITAL_MOUSE
ITBl-MOUSE
KEK4_MOUSE
KGFR-MOUSE
KLTK-MOUSE
LECI_MOUSE
LEM2-MOUSE
LEUK_MOUSE
LMPl-MOUSE
LRPA MOUSE
Chapter L8
315-337
42-40
346-362
352-369
274-297
585-603
255-283
272-292
85-105
310-331
307133
305127
306-326
306-328
304-326
30T322
306-329
2I7-242
211,-236
19612L
221.-243
196-221
190-2L5
217148
194-21.6
227-247
21"+-234
226-248
228-247
225-745
339159
237-257
234-257
358-385
271,-291,
947-967
984-1.007
1085-1108
729J51
541-564
26+-284
422446
59-79
558-579
249-271.
371-394
1.43-1.66
380-402
12-30
373*389
168-185
126-1.49
563-581
293-321
229-249
56*76
145_766
335-361
332-354
r43-1.63
338-360
141-163
r42-1.6L
142-165
96-t21.
93-118
86-L11
100-122
8G111
83-108
9TI24
8G108
1.04-124
97-117
102-124
105-124
103-123
160-180
109-129
523446
41.0437
126-146
46H.84
481-504
1125-11.48
35+-376
259-282
122-142
65ffi80
181-201
269-290
323-345
17+-197
487-510
467
394
398
405
650
907
330
492
510
362
362
357
326
368
357
372
368
255
248
233
258
233
227
264
232
265
252
264
265
263
576
268
810
460
468
1372
1039
1163
798
983
707
888
301
612
395
406
830
41t
1, 141
72
121t11T1111111111
1, 1T1
1. 1111111t1t1IaI1IL171111
23
19
T7
18
24
19
29
2L
21.
22
27
23
21
23
23
20
24
26
26
26
26
26
32
ZJ
27
2T
23
20
21
21,
2T
24
28
21,
21.
24
24
23
24
21
25
21
22
23
N A
24
23
19
17
18
24
19
29
2l
21
22
27
23
21
23
23
20
24
26
26
26
23
26
26
32
23
21,
21,
23
20
2t
2l
2L
24
28
2l
2L
24
24
23
24
2L
25
2l
22
23
24
24
468
LY49-MOUSE
MAGS-MOUSE
MAS_MOUSE
MDRT-MOUSE
MDR3-MOUSE
MPRD-MOUSE
MUC
NALS-MOUSE
NK12-MOUSE
NK14-MOUSE
NK2R-MOUSE
OLF3_MOUSE
PCT-MOUSE
PERF_MOUSE
PGDS_MOUSE
PLRT-MOUSE
PM22-MOUSE
RDS-MOUSE
SCF-MOUSE
TCA-MOUSE
TCB2_MOUSE
TCC2-MOUSE
TCC4_MOUSE
TF-MOUSE
TNFA-MOUSE
TNR2-MOUSE
TRFR_MOUSE
TYR2-MOUSE
TYRR*MOUSE
UFO_MOUSE
VCAl_MOUSE
154-175
550-569
291.1I9
27c-287
79-99
82-106
220-237
312
132-1,55
130-153
130-1.49
1-25
M9469
86-102
252-275
104-127
35-61
307130
245-267
47-67
63-84
61-80
69-89
115-137
8-28
367196
171,-L92
496-514
508-531
667-689
339-360
262
582
324
1276
1276
278
476
399
223
220
3U
312
871
554
1089
303
\67
346
273
138
173
172
190
294
235
474
393
5L7
537
888
739
1
I
\
7
12
12
1
1
1
1
1
7
7
1
2
1
1
I
4
4
I
1
1
7
1
1
L
t
7
1
1
1
1
22
20
31
20
a ^
JL
25
18
20
24
24
22
20
27
17
24
24
24
24
23
21
22
20
27
23
21
30
23
t9
24
23
22
45-46
517-536
104-134
831-850
704-735
187111.
45H73
2W
4k3
39-{.2
197-218
273-292
2545
L87103
526-549
230-253
96-119
100-123
21.5-237
113-133
1.47-1.68
141-160
158-178
252174
36-56
259188
29-51.
47349L
478-501
444466
699-720
22
20
29
18
21
25
18
20
24
24
20
25
21
t7
24
24
27
24
23
21
22
20
21
23
21
30
22
19
24
23
22
469 Ptedicition of Transmembrane Domains in Proteins
470
Thble L8.6 Thbleau withoutADFs for the transmembrane problem.
Objective: Find a program to classifz whether or not a segment of
a protein sequence is a transmembrane domain.
Terminal set
without ADFs:
LBN, M0, ML,M?, M3, random constants
Sbigg"r-r"u6, and 20 zero-argument functions (A? )
(C?),..., (Y?).
Function set
without ADFs:
oRN, SETM0, SETM1, SETM2, SETM3,
IFLTE, t, -, *, and ?.
Fihress cases: The in-sample set of fitress cases consists of 246 protein
segments. The out-of-sample set of fitness cases consists
of 250 protein segments.
Raw fitness: Correlation C (ranging from -1.0 to +1.0).
Standardized fihress: Standardized fitness is
T- C
2
Hits: 100 times the difference of 1.0 minus standardized
fitness for the out-of-sample set (rounded to nearest
integer).
Wrapper: If the result-producingbranch returns a number greater
than 0, the segment is classified as a transmembrane
domain; othenvise, the segment is classified as nontransmembrane.
Parameters: M=4,000.G=2L.
Success predicate: Abest-of-run program (as measured by in-sample
correlation) scores an out-of-sample correlation of 0.94
or better.
correlation thatwere achieved in these runs and compared performance with
and without automatically defined functions. After examining the evidence
obtained from the actual runs about the distribution of values of correlatiory
we retrospectively established the following success predicate for the problem: a program is deemed successful if it is a best-of-run program (as measured by in-sample corcelation) and it scores an out-of-snmple correlation of
0.94 or better. We then made the performance curyes as if this success predicate had been in place during the runs.
Note that whenever a predicting program is devised using a measure of
in-sampleperformance and subsequently cross-validated using a measure of
out-of-sample performance that only checks the out-of-sample correlation of
the best-of-generation individuals, the possibility inherently exists that some
other program that did not have the highest value of in-sample correlation
may,infact,have yielded ahighervalue of out-of-sample correlation than the
first program. Such a program would not be identified here as the best-ofmn program.
Chapter 18
Thble L8.7 Values of out-of-sample correlation for lL runs of the transmembrane
problem without ADFs.
Generation Out-of-sample
correlation
10
6
6
72
7
13
8
3
76
t4
20
0.7724
4.7143
0.71.43
0.8044
0.8044
0.8044
0.8044
0.8044
0.8054
0.8250
0.9M8
Table 18.6 summaflzes the key features of the transmembrane problem
without automaticallv defined functions.
L8.6 RESUITS WITHOUT ADFs FOR THE SUBSET-CREATING VERSION
In 11 runs of this problem without automatically defined functions, 10 of the
11 values of correlation are clustered in the unimpressive range between 0 .7124
and 0.8250. There is one outlier with the reasonably good correlation of 0.9448.
Table L8.7 shows, in ascendi*g ordeq, the out-of-sample correlation attained
by the best-of-run individual (as measured by in-sample correlation) during
each of the 11 runs and the generation on which the best value was achieved.
The best-of-all program from the LL runs (with an out-of-sample correlation of 0.9448 and an in-sample correlation of 0.882) has 71 points and is shoy,rn
below:
1rrrnrrn / I nnni nff-^Irar-raq-i dr roq \ lvvl/rrlv v v u! r uJf vuuD
(SETM3 (- (+ (- (F?) (K?)) (+ (- tt{3 (P?))
(+ (I?) (SETM2 (SErM3 (L?)))))) (SETM2 (SETM2 (H?))))))
(values (* (IFLTE (IFLTE (+ -5
. 505 M3 ) (* L M2 ) (% -2 .186
(IFLTE Ml M3 M2 M2)) (+ -5.605 M3)) (- (% L M3)
(- M2 (+ M0 M1))) (* M2 M0) (*' (% (+ M2 M3)
(+ M3 L) ) (Z M2 L) )) (* (+ (+ M2 Ml) (* M2 M0))
(e"M2M2)))).
If the success predicate for this problem were defined to be a value of outof-sample correlation C of 0.94 or better, only one of these 11 runs would be
deemed successfuI with this retrospective definition.
Although calculating computational effort based on only one successful
run has virfually no statistical significar:rce, the value of Ewtthout computed on
the basis of this single successful run is3,724,000.
471 Predicition of Transmembrane Domains in Proteins
Calculating the "average" structural complexity, S wirhout ,based on only one
successful run is similarly suspect. This "average" of 77 is especially suspect
since it is smaller than the average number of points (75.2) for the best-of-run
programs from the 10 unsuccessful runs in table 18.7. Thus, there is a good
chance that the true value of. Sri,no,rl ishigher thm.7!.
Since the probability of success is only 1:11 without automatically defined
functions, computation of a perfoffnernce curve would entail a great many
runs (perhaps a hundred or more). Even with the compromise of setting G
(the maximurn number of generations to be run) to only 2L, a single run of
this problem takes one and a half days on our computer. It is apparent that
any attempt to produce a performance curve for this problem without
automatically defined functions would consurne a prohibitive amount of computer time. Consequently, we decided to concentrate our available resources
on runs of this problem employing automatically defined functions. As will
be seen below, the runs with automatically defined functions did considerablybetter than the runs without them.
I8.7 PREPARATORY STEPS WITH ADFS FOR THE SUBSETLCREATING
VERSION
The programs without automatically defined functions for the transmembrane problem were unusually opaque because they combined the residuedetecting functions, the disjunctive functioru the arithmetic operations, and
the conditional branching operator into one branch.
Automatically defined functions seem well suited to the task of interrogating the residues and organungthe information in some way (e.g., categorizing the residues into categories). The isolation of this task in the
function-defining branches permits the iteration-performing branch to concentrate on the task of iteratively performing arithmetic calculations and conditional operations. Asbefore, the result-producingbranchperforms arithmetic
calculations and conditional operations to make the final decision.
Thus, we decided that the overall architecture of the predicting programs
would consist of three automatically defined functions (detectors for categorization), an iteration-performing branch for performing arithmetic operations and conditional operations usingthe as-yet-undiscoveredetectors, and
a result-producing branch for performing arithmetic operations and conditional operations using the results of the as-yet-undiscovered iteration to
classify the given protein segment as a transmembrane domain or a nontransmembrane area.
Figure 18.5 shows an abbreviated version of the architecfure for an overall
predicting program for the subset-creating version of the transmembrane
problem with automatically defined functions. The overall program has five
branches, three of which are ADFs; however, only one ADF is shown here to
save space. In addition to the three ADFs, the overall program has an iteration-performing branch, rPBO, and a result-producing branch, RPB.
472 Chapter 18
Body of Iteration Body of Result-'
Produ-cine Branch
Figure 18.5 Overall program consisting of an automatically defined function, ADF0, an iteration-performing branch, r PBO, and a result-producing branch, RPB.
Having now determined the architecture for the overall program, we now
consider the ingredients from which each branch of the overall program will
be composed.
The terminal set, '1o41 for each of the three function-defining branches AD F 0 ,
ADF1, and ADF2 contains the 20 zero-argtJmentnumerically-valued residuedetecting functions:
ta4f = { (ar), (c?),..., (Y?)}.
The function set, faay, for each of the three function-defining branches, ADF 0,
ADF1, and anr'2, contains only the two-argument numerically-valued disjunctive function:
fadf = {oRN}
with an argurnent map of
{2t.
In this problem, the function-defining branches do not refer hierarchically
to one another so all three function-definingbranches have identical terminal
sets, function sets, and argument maps. hr implementing strucfure-preserving crossover in this sifuatiory one might assign one conunon type to all three
like branches (i.e., like-branch typing) or one might assign three separate types
to the three branches (i.e., branch typing).We have chosen to continue to use
our usual branch-typing for this problem.
The terminal set, tipb1, for the iteration-performingbranch rpBO is
ttpuo- {M0, M1, M2, M3, LEN,frbigger-reals},
where M0, Ml, vt2, and M3 are settable variables and where LEN is the length
of the current protein.
Since a numerical calculation is to be performed on the results of the categorization performed by the function-defining branches, ADFO, ADFI, and
ADF2 are included in the function set for the iteration-performingbranch.
473 Predicition of Transmembrane Domains in Proteins
The function set, fipb7, for the iteration-performing branch is
fipuo - {ADF0, ADF1, ADF2, SETMO, SETMI, SETM2, SETM3, TFLTE, *, -, *, %}
with an argument map of
{0, 0, 0, 1, 1, r, r,4,2,2,2,2}.
As before, the terminal set, 'lrpb, for the result-producing branch, RPB, is
trpb= {LEN, M0, M1, M2, M3,frbigg".-."dr}.
Similarly, as before, the function set, frr6, for the result-producing
branch is
frpb = { tElrn, +, -, *, %}
with an argument map of
{4,2,2,2,2}.
Table 18.8 summarizes the key features of the subset-creating version of
the transmembrane problem with automatically defined functions.
L8.8 RESULTS WITH ADFs FOR THE SUBSETLCREATING VERSION
hr this section, we will see that automatically defined functions greatly facilitate solution of this problem. The correlations will prove to be higher than
when automatically defined functions are not used. Moreoveq, these higher
values of correlation will be achieved on many different runs.
We first examine run L, the second best run with automatically defined
functions.
In this run, the vast majority of the randomly generated programs in the
initial random population (generation 0) have a 0.0 or near-zero correlation,
C indicating that they are no better than random in recognizing whether a
protein segment is a transmembrane domain. M*y of these prograrns achieve
their poor performance because the result-producing branch retums the sarrne
value regardless of the composition of the protein segment presented. This
occurs for various reasons. Sometimes the iteration-performing branch entirely
ignores the three automatically defined functions (thus totally disconnecting
the iteration-performing branch and the result-producing branch from the
input of the problem). In other programs, the settable variables are either not
set at all or effectively set to a constant value. And in other programs, the
result-producing branch ignores the settable variables. The net effect is that
these programs with zero coffelation classify all segments the same and achieve
123 true positives andt23 false positives (or 123 true negatives and123 false
negatives) over the246 in-sample fitness cases. For these random programs
with zero correlatiory the accuracy measure, Q3,is 0.5, the error measure E3rs
0.5, the percentage of agreement, co,is1.}L/o, the measure of overpredictiory
cno,is 0% (or vice versa), uttd the Qa measure is 50%.
The best-of-generation predicting program from generation 0 of run t has
an in-sample correlation of 0.48 and a standardized fihress of 0.26 as a result
474 Chapter 18
Thble L8.8 Thbleau with ADFs for the subset-creating version of the transmembrane
problem.
of getting 99 truepositives, 33 true negatives,40 falsepositives, and 24false
negatives over the 246 in-sample fitness cases. IA/hen tested on the out-ofsample set, this 82-point program has an out-of-sample correlation of 0.43
and an out-of-sample standardizedfitress of 0.28 as a result of getting 94 true
positives, 85 true negatives,40 false positives, and 31 false negatives over the
250 out-of-sample fihress cases. It scores T2ritsand is shown below:
(progn (defun ADFO ( )
(values (ORN (ORN
(oRN (w?) (L?) )
(defun ADF1 o
(values (ORN (ORN
(oRN 1121 (v?) )
(defun ADF2 ( )
(values (ORN (ORN
(onN 1p21 (F?) )
(oRN (r?) (M?))
(oRN (Y?) (A?) ) )
(oRN (L?) (L?))
(oRN (R?) (Q?) ) )
(oRN (R?) (S?) )
(oRN (Y?) (C?) ) )
(oRN (v?) (C?) ) ) (oRN
)))
(oRN (R?) (i<21 I I (oRN
)))
(oRN (F?) (a?) ) ) (oRN
)))
Objective: Find a program to classify whether or not a segment of
a protein sequence is a transmembrane domain.
Architecture of the
overall program
with ADFs:
One result-producing branch, one iteration-performing
branch, and three zero-argument function-defining
branches, with no ADF hierarchically referring to any
otherADF.
Parameters: Branch gpi"g among the three automatically defined
functions.
Terminal set for the
iteration-performing
branch:
LEN, M0, MI,M2, M3, and the random constants
frbigg".-."ulr.
Function set for the
iteration-performing
branch:
ADF0, ADF1, ADF2, SETMO, SETMI, SETM2, SETM3,
IFLTE, *, -, *, and %.
Terminal set for the
result-producing
branch:
LEN, M0, NIL,M7, M3, and the random constants
frbigg"r-."u]"'
Function set for the
result-producing
branch:
IFLTE, *, -, *, and %.
Terminal set for the
function-defining
branches ADF0,
ADF1, and aor'2:
TWenty zero-argument functions
(A?), (C?),..., (Y?).
Function set for the
function-defining
branches ADFO,
ADF1, and anp2:
Numerically valued two-argument logical disjunction
function ORN.
475 Predicition of Transmembrane Domains in proteins
(Progn ( looPing-over-residues
(sErM0 (sErM3 (sErM0 (ADFO))))
(values (IFLTE (+ (- M3 M0 ) (+ M1 M3 ) ) (Z
(TFLTE M0 M3 6.2L2 Mr\ (rFLrE M0 M2 M1 L) ) ( *
(% Ml M2) (* M3 0.419)) (+ (z L M2) (- M 0
M2)))))))
hr examining this program from run 1., we see that ADF0 retums L for any
amino acid residue from the defined set {1, M, V,C, W, l-, Y, A}.Six of these
eight residues are hydrophobic and two (W and Y) are neutral according to
the categories shown in table 18.1 (the Kyte-Doolittle hydrophobicity scale).
hr other words, ADF O is an imperfect detector of hydrophobic residues in that
it omits one of the hydrophobic residues (F) and includes two neutral residues (W and Y). Note that this inteqpretation of ADF0 is based on our knowledge of the Kyte-Doolittle hydrophobicity scale (table 18.1) and the three
categories that can reasonably be induced from that table using clustering
techniques; genetic programming does not have access to these Kyte-Doolittle
values or the three categories.
The iteration-performing branch of the best of generation 0 refers only to
ADFO. The iteration-performing branch sets the settable variable M0 to the
value of anr'O and sets the settable variable M3 to the value of u0. Since this
branch only writes values into u3 and vtO (and does not ever read u3 or M0),
the final values of u3 and u0 after the iteration over the entire protein segment is merely the value of anpO for the very last residue of the protein segment. In other words, M3 and MO are both 1 if the last residue is in the particular
subset of residues designated by ADFO. Since ADFO is an imperfect hydrophobicity detectoq, the final values of vt3 and wtO are usuallf,but not reliably,
1 if the last residue is hydrophobic. Since the settable variables Ml- and M2 are
not referenced by the iteration-performing branch, they both remain at their
initial values of 0.
The result-producing branch can therefore be simplified to
(TFLTE M0 (Z 6.2I2 (rFLrE M0 0 0 L) ) (" M0 0.4L9) (+ 1 M0)))).
If uO is 0, the best of generation 0 retums 0 (which the wrapper interprets
as non-transmembrane); but if tuO is 1, then this program retums 2 (which the
wrapper interprets as transmembrane). The entire protein segment is classified as being transmembrane or not on the basis of whether the last residue of
the segment is in the imperfect subset defined by ADFO. For example, this
program happens to correctly classify the segment consisting of residues
96-119 of mouse peripheral myelin protein 22 (table L8.2) as a transmembrane domainbecause residue 119 isvaline (V). However,because residue 6L
of the negative case shown in table 18.3 is tryptophan (W), this program incorrectly classifies t}re27-residue segment from positions 35-61 of this szrne
protein as a transmembrane domain.
This program is highly flawed, since it myopically looks at only a single
residue of the protein segment in making an ill-advised decision based on a
defective ADFO. Howeveq, this program is better than any of the other 3,999
programs in the population at generation 0.
476 Chapter 18
0
v) q)
T
r-, -
6 0.s
N
fi
a
0.0
Worst of Generation
+ Average
"* Best of Generation
9
Generation
Figure 18.6 Fitness curves for run 1 of the subset-creating version of the transmembrane
problem.
The worst-of-generation predicting program from generation 0 of run L
has an in-sample correlation of -,0.4. Like the best-of-generation program, it
myopically looks at only one residue in the protein segment and then creates
a highly imperfect hydrophobicity detector ADFI. This program achieves its
negative value of correlation by then using its incomplete information in precisely the wrong way.
Figure L8.6 shows the fitness curves for this run. At generation 0, the
in-sample correlation of the best-of-generation program is 0.48 and the standardized fitness is 0.26. The in-sample correlation of the worst-of-generation
program is -0.40 and its standardized fitness is 0.70.
hr generation 2 of run 1-, the best-of-generation program achieves an incrementally better correlation (0.496 in-sample and 0.472 out-of-sample) by
virtue of an incremental change consisting of just one point in the definition
of aor'0.
There is a major qualitative change in the best of generation 5. The best of
generation 5 is the first best-of-generation program in this run that makes its
prediction based on the entire protein segment. This 62-point program has a
distinctly better in-sample correlation of 0.764, an out-of-sample correlation
of 0.7M, a standardizedfibress of 0.L2, and scores 89 hits.
(progn (defun ADFO o
(values (oRN (oRN (I?) (A?) ) (oRN (oRN (L?) (c?))
(N?)))))
(defun ADF1 o
(values (ORN (ORN (ORN (ORN
(oRN (ORN (R?) (E?)) (OnN
(sr11111
(defun ADF2 o
(values (ORN (ORN (ORN (L?)
(G?) (L?)))))
(G?) (D?) ) (oRN (E?) (v?) ) )
(r?) (P?)))) (oRN (N?)
(R?) ) (oRN (v?) (P?) ) ) (oRN
(progn ( looping-over-residues
(SErM1 (- (+ M1 (ADFO)) (ADF]-))))
Predicition of Transmembrane Domains in Proteins
(values (* (% (+ (Z -9.997 M3) M1) 6.602) (+ 6.738
(e" (- M3 L) (+ M3 M2))))))) .
The iteration-performingbranch of this program uses the settable variable
M1 to create a running sum of the difference between two quantities. Specifically, as the iteration-performing branch is iteratively executed over the protein segment, M1 is set to the current value of ut plus the difference between
ADF 0 and aor'1 . ADF 0 consists of nested oF<Ns involving the three hydrophobic residues (1, A, and L), one neukal residue (G), and one hydrophilic residue
(N). alrr consists of nested oRNs involving one hydrophobic residue (V),
four neutral residues (G, T, P, and S), and the four most hydrophilic residues
(D, E, R, and N).
Because the neutral G residue and the hydrophilic N residue appear in
both ADF0 and eor'1, there is no net effect on the running sum of the differences, M1, calculated by the iteration-performing branch when the current
residue is either G or N. There is a positive contribution (from ADFO) to the
running sum M1 only when the current residue is l, A, or L (a11 of which are
hydrophobic), and there is a negative contribution (from ADFI) to the running sum M1 only when the current residue is D, E, or R (all of which are
hydrophilic). The running sum M1 is a count @ased on a sample of only three
of the seven hydrophobic residues and only three of the seven hydrophilic
residues) of the excess of hydrophobic residues over hydrophilic residues.
\A/hen simplified, the result-producing branch is equivalent to
1.77 x(Mr+l),
so the protein segment is classified as a transmembrane domain whenever
Ml is greater than -1. h other words, whenever the number of occurrences of
the three particular hydrophobic residues (1, A, and L) equals or exceeds the
number of occurrences of the three particular hydrophilic residues (D, E, and
R), the segment is classified as a transmembrane domain. This relatively simple
calculation is a very imperfect predictor of transmembrane domains, but it is
correct more often than its ancestors.
In generation 6 of run 1, the best-of-generation program has marginally
better values for correlation Q.766 in-sample and 0.834 out-of-sample) by virtue of a small evolutionary change in the definition of anp1.
The 62-point best of generation 8 of run 1 exhibits a substantial jump in
performance over all its predecessors from previous generations. The improvement arises from a small change in ADFO and a major change in eDF1.
In-sample correlation rises to0.92; out-of-sample correlation rises to 0.89. Hits
rise to 94.
(progn (defun ADF0 o
(values (ORN (ORN (ORN
(oRN (L?) (G?) ) (N?) )
(defun ADF1 o
.
(values (ORN (ORN (ORN
(oRN (ORN (R?) (E?))
(sr;1111
Chapter 18
(r?) (M?) ) (oRN (v?) (c?) ) ) (oRN
)))
(oRN (G?) (D?) ) (oRN (E?) (v?) ) )
(oRN (r?) (P?) ) ) ) (oRN (N?)
(defun ADF2 o
(values (oRN (oRN (oRN (L?) (nr)) (oRN (V?) (p?))) (oRN
(c?) (L?)))))
(progn ( looping-over-residues
(SErMl (- (+ M1 (ADFO) ) (ADFI) ) ) )
(values (* (+ M]- M3) (+ 6.738 (A (- U: L) (+ M 3
M2))))))) .
hr this program, ADF o tests for four (1, M, C, and L) of the seven hydrophobic residues, instead of three. Moreover, isoleucine (l), the most hydrophobic
residue among the seven hydrophobic residues on the Kyte-Doolittle scale, is
one of the residues incorporated into ADF0. More important, ADF1 tests for
three neutral residues (T,P, and S) as well as three hydrophilic residues (D, E,
and R). The result-producing branch calculates 7.73BMr
As before, a protein segment will be classified as a transmembrane domain
whenever the running sum M1 is positive.
The three neutral residues (T, P, and S) in annf play an important role since
a positive value of vt1 can be achieved only if there are enough sampled hydrophobic residues in the segment to counterbalance the sum of the number
of occurrences of the four hydrophilic residues plus the number of occurrences of the three neutral residues.
In generation L1 of run 1, the 78-point best-of-generation program has an
in-sample correlation of 0.94 and a standardized fitness of 0.03 as a result of
getting 117 truepositives,l22truenegatives,l falsepositive, and 6 falsenegatives over the246 in-sample fifiress cases. It has an out-of-sample correlation
of 0.96 and a standardized fibress of 0.02 as a result of gettingl22true positives, 123 true negatives,2false positives, and 3 false negatives over the 250
out-of-sample fibress cases. This prograrn scores 98 hits; its out-of-sample
error rate is only 2.0%.
(progn (defun ADFO o
(values (oRN (oRN (oRN (I?) (M?)) (oRN (V?) (C?))) (oRN
(oRN (L?) (c?)) (N?)))))
(o.erun ALJI,'I o
(values (oRN (oRN (oRN (oRN (c?) (D?)) (oRN (E?) (V?)))
(oRN (ORN (R?) (E?)) (ORN (ORN (ORN (ORN (c?) (D?))
(oRN (E?) (v?)) ) (ORN (ORN (R?) (Kr) I (ORN (r?)
(p?)))) (oRN (N?) (s?))))) (oRN (N?) (sr)))))
(defun ADF2 o
(values (oRN (oRN (oRN (L?) (y?)) (oRN (V?) (p?))) (oRN
(G?) (L?)))))
(progn ( looping-over-residues
(SETMI (- (+ Ml (ADFO)) (ADF1))))
(values (* (+ Mt M3) (+ 6.738 (t (- t'C r,) (+ M 3
M2))))))) .
This program from generation 11 of run 1 is similar to, but better than, the
best of generation 8. The definition of AnrO, the calculation of the running
479 Predicition of Transmembrane Domains in Proteins
sum Ml- in the iteration-performing branch, and the final calculation in the
result-producing branch are the same. Howeve4 ADF1 differs from the best
of generation 8 in that it tests for four of the seven hydrophilic residues (D, E,
R, and K), not just three. Moreover, D, E, R, and K are the most hydrophilic
residues from among the seven hydrophilic residues according to the KyteDoolittle scale. Consequently, a majority of the hydrophobic residues are
effectively tested by ADFO and a majority of the hydrophilic residues (along
with three neutral residues) are tested by eDF 1. This change makes the running
surn M1 even better at predicting whether a protein segment is in a transmembrane domain.
The seven sampled hydrophilic and neutral residues in aor'1 account for
40.7% of the residues in all proteins and the four sampled hydrophobic residues in aop O account for 18.3"/" (Creighton 1993). Roughly speaking, a transmembrane prediction by the result-producing branch is triggered by u
reduction in the normal ratio of better than2:1to a ratio of justbelow 1:1.
The operation of this program from generation L1 of run L can be sufiunarized as follows: If the number of occurrences of l, M, C, and L in a given
protein segment exceeds the number of occurrences of D, E, R, K,I,P, and S,
then classify the segment as a transmembrane domain; otherwise, classify it
as non-transmembrane.
As before, the residues V, G, and N play no role in the calculation of the
running sum Ml- since they appear in both ADFO and anp1.
Table 18.9 shows, by generatiory on both an in-sample and an out-of-sample
basis, the correlation C, thenumber of true positives ,Nrprthe number of true
negatives, Ntu, the number of false positives, Nfp, and the nurnber of false
negatives,Nfnrfor the best-of-generation programs of run 1. The number of
hits (out-of-sample) is also shownin the last column. As canbe seery the best
value of out-of-sample correlation is first achieved on generation 11.
After generation 11, the in-sample performance of the best-of-generation
program continues to improve. For example, the in-sample correlation
improves from 0.94to 0.98 between generations L1 and 18, and the number of
in-sample errors (i.e., false positives plus false negatives) drops from seven to
3. Specifically, the number of false negatives drops from six to two between
generations L1 and 18 while the number of false positives remains at one. The
best of generation 18 differs from the best of generation 11 in that two of its
function definitions are somewhat different. As a result of these changes, the
best of generation 18 correctly classifies the designated segments of the proteins B2AR-MOUSE, THRR-MOUSE, GAC2-MOUSE, and GAD-MOUSE
as transmembrane domains, whereas the best of generation 11 erred on these
four segments.
However, this apparent improvement in run I after generation 11 is due to
overfitting. Genetic programming is relentlessly driven to achieve better and
better values of fibress. Fitness for this problem is based on the value of the
correlation for the predictions made by the genetically evolved program on
the in-sample setof fitness cases. Howeve{, the true measure of performance
Chapter 18
Thble 1,8.9 In-sample and out-of-sample correlation of best-of-generation programs
from run L of the subset-creating version of the transmembrane problem.
Lr-sample Out-of-sample
Generation
C N-' r Nr, Nr" Ns, C N-- M, N'" Nf, Hits
0
1
2
^
J
4
5
6
7
8
I
10
11
L2
13
t4
15
I6
17
18
0.48
0.48
0.50
0.50
0.50
0.76
0.77
0.77
0.92
0.92
0.92
0.94
0.94
0.95
0.95
0.96
0.96
0.96
0.98
99
99
92
92
110
107
113
113
r77
717
122
117
1t7
119
118
I20
119
L19
r21,
83
83
92
92
72
110
104
104
119
7I9
t74
122
122
72I
122
127
122
t22
122
40
40
31
31
51
L3
19
t9
4
4
9
1
1
2
1
2
1
1
L
24
24
31
31,
13
76
10
10
6
6
1
6
6
4
5
3
4
4
2
0.43
0.43
0.47
0.47
0.54
0.78
0.83
0.83
0.89
0.89
0.83
0.96
0.96
0.94
0.93
0.92
0.93
0.93
0.94
94
94
91
91
115
110
119
119
122
122
122
r22
122
123
117
123
118
118
121
85
85
93
93
74
113
110
110
11.4
t14
L06
123
t23
119
124
117
L23
123
122
40
40
32
32
51
72
15
15
11
11
19
2
2
6
1
8
2
2
a
J
31
31,
34
u
10
15
6
6
J
3
3
a
J
^
J
2
8
2
7
7
4
72
72
74
74
76
89
92
92
94
94
9l
98
98
97
96
96
96
96
97
for a predicting algorithm is how well it generalizes to the previously unseen
out-of-sample data. Lr this run, the out-of-sample correlation drops from 0.96
to 0.94 between generations LL and L8, and the number of out-of-sample
errors increases from five to seven (one additional false positive and one
additional false negative). The maximum value of out-of-sample correlation
is attained at generation 11. After generationll,, the evolved predicting programs are being fitted more and more to the idiosyncrasies of the particular
in-sample fitness cases employed in the computation of fitness. The predicting programs after generation 11 are not getting better at predicting whether
protein segments are transmembrane domains. Th"y are merely getting better at memorizing the in-sample data.
Figure L8.7 compares, for generations I through 18, the in-sample correlation and the out-of-sample correlation for run 1. As can be seen, the out-ofsample correlation peaks at 0.96 on generation 1L, but the in-sample correlation
increases over the range of this figure. Overfitting is occurring after
generation 1L.
481 Predicition of Tiansmembrane Domains in Proteins
In Sample
+ Out of Sample
* cerrJiatior, 18
Figure 18.7 Comparison of values of in-sample and out-of-sample correlation between
generations 8 and L8 for run L of the subset-creating version of the transmembrane problem.
The hits measure (which is based on the out-of-sample correlation) also
reflects the fact that the best result is obtained at generation 11. There are 98
hits at generation 11but only 97 atgeneration 18.
The continuation of run 1 out to generation 50 produces no result better
than that attained at generation LL.
An examination of fwo other runs provides additional insights into this
problem.
Run 2 is interesting because it achieves a suqprisingly high out-of-sample
correlation of 0.93 with its iteration-performingbranch calling only one automatically defined function, ADF1. This best-of-run program emerged in
generation 14 of nrr2.
(progn (defun ADF0 o
(values (ORN (L?) (E?))))
(defun ADF1 o
(values (oRN (oRN (E?) (K?) ) (oRN (oRN (P?) (D?))
(R?)))))
(defun ADF2 o
(values (oRN (oRN (F?) (I?)) (oRN (D?) (R?)))))
(progn ( loop-over-residues
(SErM1 (SErM2 (+ M2 (ADF1) ) ) ) )
(values (% (- (- uf L.434) L.434 ) (% (* M2 tq2)
(+ M0 -2.836)))))).
hr this program from nsnZ,ut and M2 are equal; both are the running sum
of the values returned by anrt. ADF1 retums 1 if the current residue is E, (
D, R (i.e., the four most hydrophilic residues) or P (a neutral residue).
Substituting u1 for M2, arrd deleting M0 (which always equals 0), the resultproducingbranch is equivalent to
-2.836(Mt - 2.868)
- . Mi
0.8
482 Chapter L8
a
o
q)
+a
Fr
U o.s
N
L
0
Worst of Generation
+ Average
** Best of Generation
o
c.rr.1"%tior, 20
Figure L8.8 Fitness curves for run 3 of the subset-creating version of the transmembrane
problem.
This expression canbe positive (indicating a transmembrane domain) only if
there are fewer than three residues in the segment from the set {E,K,D, R, p}.
Since E, K, D, and R represent only about half of the occurrences of hydrophilic residues in a typical protein, this expression is, roughly speaking, a test
that the number of hydrophilic residues in the segment is fewer than about
six. This Program implicitly exploits the fact that the average length of
the protein segment in our set of fibress cases is 21..7 in establishing six as a
threshold.
Run 3 produced the best-of-all program for any run of the subset-creating
version of the transmembrane problem.
Figure 1 8.8 shows the fitness curves for run 3. At generation Q the in-sample
correlation of the best-of-generation program is 0.404 and the standardized
fitness is 0.298. At generation 20, the in-sample correlation is 0.976 and the
standardized fibress is 0.0122.
This high correlation is achieved on generation Z0 ofrun 3 by a program
with an in-sample correlation of 0.975 resulting from getting 121, truepositives, lZ?true negatives, 1 false positive, and 2 false negatives over the 246
in-sample fibress cases. Its out-of-sample correlation of 0.968 is the result of
getting 123 true positives, 123 true negatives, 2 false positives, and 2 false
negatives over the 250 out-of-sample fituress cases. It scores 98 hits.Its out-ofsample error rate is only L.6%. This program consists of 105 points and is
shown below:
(progn (defun ADF0 o
(values (oRN (oRN (oRN (r?) (H?)) (oRN 1p2y (c?))) (oRN
(oRN (ORI\T (y") (N?) ) (oRN (r?) (O?) ) ) (oRN (A?)
(H?))))))
(defun ADF1 ( )
(values (oRN (oRN (oRN (A?) (r?)) (oRn (lr) (wr)) ) (oRN
(oRN (r?) (L?) ) (oRN (r?) (wr) ) ) ) ) )
Predicition of tansmembrane Domains in proteins
(defun ADF2 o
(values (oRN (oRN (oRN (oRN (ORN (D?) (E?)) (oRN (oRN
(oRN (D?) (E?) ) (ORN (oRN (r?) (w?) ) (oRN (Q?) (D?) ) ) )
(oRN (K?) (P?) ) ) ) (oRN (K?) (P?) ) ) (oRN (r?) (w?)) )
(oRN (ORN (E?) (A?) ) (ORN (N?) (nr) ) ) ) ) )
(progn ( loop-over-residues
(SETMO (+ (- (ADF1) (ADF2)) (SErM3 M0))))
(values (% (% M3 M0) (Z (Z (Z (- L -0.53) (" M 0
M0)) (+ (z (% M3 M0) (% (+ M0 M3) (% M1 M2)))
M2)) (U M3 M0)))))).
Ignoring the three residues conunon to the definition of both ADF1 and
ADF2, ADF1 retums 1 if the current residue is I or L and ADF2 returns 1 if the
current residue is D, E, K, R, Q, N, or P. I and L are two of the seven hydrophobic residues on the Kyte-Doolittle scale. D, E, K R, Q, and N are six of the
seven hydrophilic residues, and P is one of the neutral residues.
In the iteration-performing branch of this program from generation 20
of run 3, rito is the running sum of the differences of the values returned by
ADFl and enr'2. MO will be positive only if the hydrophobic residues in
the protein segment are so numerous that the occurrences of I and L outnumber the occurrences of the six hydrophilic residues and one neutral
residue of anp2. It3 is the same as the accumulated value of uO except
that M3 lags M0 by one residue. Because the contribution to u3 in the iteration-performing branch of the last residue is either 0 or L, M3 is either
equal to M0 or is one less than M0.
The result-producing branch is equivalent to
Ml
Mo(Mo + MzXt + 0.53)
The subexpression ( - LEN - 0 . 5 3 ) is always positive and therefore can
be ignored in determining whether the result-producing branch is positive or nonpositive. Because of the close relationship between M0 and M3,
analysis shows that the result-producing branch identifies a protein segment as a transmembrane domain whenever the running sum of the differences, M0, is greater than 0, except for the special case when M0 = 1 and
M3 = 0. This special case occurs only when the running values of tutO and
M3 are tied at 0 and when the very last residue of the protein segment is I
or L (i.e., ADFl returns L).
Ignoring this special case, we can summarize the operation of this overall
best-of-all program from generation 20 of run 3 as follows: If the number of
occurrences of I and L in a given protein segment exceeds the number of
occurrences of D,E, K R, Q, N, and P, classify the segment as a transmembrane domain; otherwise, classify it as non-transmembrane.
Figure 18.9 shows that the out-of-sample correlation closely tracks the
in-sample correlation in the neighborhood of generation 20 of run 3. At generation 20, the out-of-sample correlation is 0.968 and the in-sample correlation is 0.976.
Chapter L8
In Sample
+- Out of Sample
o
GenJ9ation 20
Figure 18.9 Comparison of values of in-sample and out-of-sample correlation for run 3 of the
subset-creating version of the transmembrane problem.
Thble 18.10 Statistics for the best-of-all program from run 3 for the subset-creating
version of the transmembrane problem.
Out-of-sample statistics Best-of-run from
generation 20
Number of fitness cases N1.
Number of true positives N7o
Number of true negatives N2,
Number of false positives Nyo
Number of false negatives Ntr,
Lr-sample correlation C
Out-of-sample correlation C
Standardized fitness
Hits
Accuracy Q3
Error rate
Percentage of agreement co
Percentage of overpredictton cro
250
123
123
2
2
0.976
0968
0.16
98
98.4%
l.6Yo
98.4%
98.4%
Thble L8.LL Comparison of five methods for the subset-creating version of the transmembrane problem.
Method Error rate
von Heijne1992
Engelman, Steitz, and Goldm an 1986
Kyte-DooliftIe1982
Weiss, Cohen, and Indurkhya 1993
Best-of-all genetically evolved program
from run 3 of the set-creating version
Predicition of Transmembrane Domains in Proteins
2.8%
2.7%
2.s%
2.5%
r.6%
Table 18.10 summarizes the measures of out-of-sample statistics for thebestof-allprogam for the subset-creatingversion of the transmembrane problem
(i.e., the best of generation 20 of run 3).
For the reasons stated above, we prefer correlation for evolving prediction
proglams and for measuring their performance. Howeve4 many paPers on
predicting algorithms in the biological literature (particularly older papers)
use other measures, such as accuracy, percentage agreement, and error rate.
Weiss, Cohen, and Lndurkhya (1993) use the conunon yardstick of error rate
to compare three methods in the biological literature with a new algorithm of
their own created using the SWAP-1 induction technique (Weiss and
IndurkhyaIggl.;Arikawa et al. 1992). Therefore, we present the comparison
below using error rate.
Thble 18.11 shows the error rates for the four algorithms for recognizing
transmembrane domains reviewed by Weiss, Cohen, and hrdurkhyu (1993)
as well as the out-of-sample error rate of the best-of-all genetically-evolved
program from generation 20 from run 3 for the subset-creating version of the
transmembrane problem. As can be seen, the error rate of the genetically
evolved program is the best of the five.
In fact, our second best genetically evolved program (from generation lL of
run 1) also outscores the other four methods (with an out-of-sample error rate
of 2.0%).
Wewrote a computerprogramto testthe solution discoveredbythe SWAP1 induction technique used in the first experiment of Weiss, Cohen, and
hrdurkhya (1993). Our implementation of their solution produced an error
rate on our test data identical to the error rate reported by them on their owrl
test data (i.e., the 2.5% of row 4 of table 18.11). Weiss, Cohen, and Indurkhya
reported that the error rate for their method was suPerior to the error rate of
two of the other three methods (rows 1 end 2 of table 13.11) on their test data
and was equal to the error rate of one of the other three methods (row 3 of the
table 18.11) on their test data.
We did not write a computer programs for the other three methods and
therefore carurot say, with certainty, that our best genetically evolved Progam
is the best of the five methods on our test data. Itt u.y event, the genetically
evolved program clearly comPares favorably to the other methods.
As mentioned above, after making 11" runs of this problem without automatically defined functions, we noted that all the values of out-of-sample
correlation achieved in all but one of the runs are clustered at relatively low
values of correlation. We therefore retrospectively defined the success predicate for this problem as the attainment of an out-of-sample correlation C of
0.94. On the basis of this retrospective success predicate, only one of the
LL runs of this problem without automatically defined functions is deemed
successful and six of the 22 runs with automatically defined functions are
deemed successful.
Table 18.12 shows, in ascending order, the out-of-sample correlation
attained by the best-of-run individual from the six successful runs (out of 22)
and the generation on which the best value was attained.
486 Chapter L8
Thble 18.12 Values of out-of-sample correlation for six successful runs (out of 22) of
the subset-creating version of the transmembrane problem with ADFs.
Generation Out-of-sample
correlation
72
16
7
13
1L
20
0.9M
0.945
0.945
0.952
0.960
0.968
. G
0
a
0)
a
CJ
*
o
CrOJU
>) *)
. l
-
d
L
A .
-
With Defined Functions
2,500,000
(20,277o)
(7,4.6vo) Generation
Figure 18.10 Performance curves for the subset-creating version of the transmembrane problemshowing that E.;r1, = 1,020,000withADFs.
Thble 18.13 Comparison table for the subset-creating version of the transmembrane
problem.
WithoutADFs WithADFs
5,000,000E
q) (n(n()
I
tr
A .
-
0)
-
lFa
a
-
)
€a -
.-
rFl
t
E
f-l
t- P,M" I
l+ I(M, i, z)l
I M = 4pool
I z--99%o I
lR(z)=15 I
I N=22 |
16 E = 1,020,000
Average strucfural
complexity S
Computational effort E
7L
3,724,000
r22.0
1.,020,0a0
487 Predicition of Transmembrane Domains in proteins
488
The average skuctural comple nW, S *nnout, of thebest-of-run Programs from
the six successfrrl runs (out of 22 runs) of the subset-creating version of the
transmembrane problem without automatically defined functions is L22.0
points.
Figure 18.10 presents the perforrnance curyes based on the 22rwts of the
transmembrane problem with automatically defined functions. The cumulative probabitity of success, P(M,i), is 27%by generation l'6 and is 27o/'by
generation 20. The two numbers in the oval indicate that if this problem is
runthroughto generationL6,processing a total of E.u, ="l',02A,000 individuals (i.e., 4,000 x 17 generations x L5 runs) is sufficient to yield a satisfactory
result for this problem with 99% probability18.9 SUMMARY FOR THE SUBSET:CREATING VERSION
Table 18.13 compares the average strucfural complexity, S*ithout adrtd S*i,n,
and the computational effort, E*ithort artd E*r,y, for the transmembrane problem with automatically defined functions and without them. Note that the
value of 7l for S*i,norr and the value of E*,,rou, of 3,724,000 without automatically defined functions are based on only one flln that satisfies the retrospectively-created success predicate.
Lr summary, automatically defined functions reduce the computational
effort necessary to yield a satisfactory result for this problem.
The error rate produced by the genetically evolved program for the subsetcreating version of the transmembrane problem described here is better than
the error rates reported for the four algorithms for recognizing transmembrane domains reviewed in Weiss, Cohen, and Indurkhya (1993).
1.8.1.0 THE ARITHMETIC.PERFORMING VERSION
In the foregoing set-creating version of the transmembrane problem, the
automatically defined functions were used for the pu{pose of set formation.
The only function in the function set of the function-defining branches is the
two-argument numerically-valued disjunctive function oRN. This approach
constrained the nature of the detectors that could be evolved in two ways.
First, an automatically defined function could retum only +L or -L. Second,
all the amino acid residues in erny one automatically defined function were
given equal weight in the decision made by the or<N function.
We used this set-creating approach because our interest in the transmembrane problem was motivated by the set-creating approach presented by Weiss,
Cohen, and Indurkhya (1993) at the First Intemational Conference on Intelligent Systems for Molecular Biology. Howeveq, the question naturally arises
is to whether genetic programming can solve this same problem without
predetermining that the automatically defined functions wouldbe used solely
for the purpose of set formation.
h this section, we enable each automatically defined function to perform
arbitrary arithmetic operations as well as conditional operations and to
Chapter 18
Thble L8.L4 Partial tableau with ADFs for the arithmetic-performing version of the
transmembrane problem.
Terminal set for the
function-defining
branches ADF0, ADF1,
and Aop2:
Random constants frbigg"r-r"ul, and 20 zero-argument
functions
(A?), (C?),..., (Y?).
Function set for the
function-defining
branches ADF0, ADF1,
and Aop2:
IFGTZ, +, -, *, %, and ORN.
Types of points: The IFGTZ operator (which is always at the root).
Point in first argument (condition part) of TFGTZ.
The rFGTZ operator when positioned other than at
the root.
Point in an arithmetic expression.
Rules of construction: The root node must be an IFGTZ.
The condition (first) argument of an rFGTZ may
contain any composition of onu and the residuedetecting functions (A? ), (C? ), ..., (y? ) .
The second (then) and third (else) arguments of an
IFGTZ may be only other IFGTZs or compositions of
the arithmetic functions +, -, *, %, andthe random
constants fr bigg".-r.alr.
An arithmetic function can have as its arguments only
random floating-point constants or arithmetic
functions.
retum a potentially more discriminating floating-point value (rather than just
+1 or -1).
This increase in flexibility is achieved by adding the four arithmetic operations (*,-, *, and %) and the conditional decision-making operator rFGTZ (If
Greater Than Zerc) to the function set of the function-definingbranches. The
two-argument numerically-valued disjunctive function oRN is retained.
rFGTZ (If Greater Than Zero) is a three-argument conditional decisionmaking operator that retums the result of evaluating its second argument if
its first argument is greater than zero, but otherwise retums the result of evaluating its third argument. Since there are side effecting functions in this problem (i.e., the four setting functions), rncrz mustbe implemented as a macro
as described in section 12.2.
Experience indicates that we ccrn increase our ability to understand the
genetically evolved expressions arising from hierarchies of conditional decision-making operations such as rFGTZ by imposirg u constrained syntactic
structure on the branches that use the conditional operation. The rules of construction for the constrained syntactic structure require that the root node of
each function-definingbranch must always be TFGTZ. The program typically
contains additional rrcrzs; howevel, the first (condition) argulnent oi every
rFGTz must be a composition only of oRNs and the 20 residue-detecting
489 Predicition of Transmembrane Domains in proteins
1.0
a
0
0)
-
-
u 0.s
N
L
E
:
a
Worst of Ge
+ Average
+ Best of Generation
10
Generation
Figure 18.11 Fitness curves for one run of the arithmetic-performing version of the transmembrane problem
functions (A? ), (C? ), ..., (Y? ) . hrparticulaq, thefirst argument of an TFGTZ
never contains another TFGTZ. The second (then) and third (else) arguments
of each rFGTZ consist onty of compositions of other rFGTZ operators, the
arithmetic operationS (*, -, *, and %), and random floating-point constants.
The effect of these constraints is that the condition (antecedent) part of each
rFGTZ is truly a condition and the consequent parts of each IFGTZ are truly
consequents (i.e., another IFGTZ or an expression that evaluates to a floatingpoint value). hr this problem, the effect of nesting an rFGTZ in either the thenpart or the else-part of another rFGTZ is to create a hierarchy.
Table l8.14is a partial tableau withADFs that contains only the differences
between this arithmetic-performing version of the transmembrane problem
and the subset-creating versions (tables 18.6 and 18.8).
Figure 18.11 shows the fibress curves for one run. At generation 0, the
in-sample correlation of the best-of-generation program is 0.532 and the standardized fibress is 0.234.
The out-of-sample error rate for the best of generation 5 of this run of the
arithmetic-performing version of the transmembrane problem equals the outof-sample error rate of 1.6% for the best-of-all program from the subset-creating version of the transmembrane problem.
The best of generation 5 has an in-sample correlation of 0.912 resulting
from getting 114 true positives ,L21. trsenegatives ,2falsepositive, and 9 false
negatives over rJ.e 245 in-sample fitness cases. Its out-of-sample correlation
of 0.968 is the result of getting L23 tme positives,l23 true negatives, 2 false
positives, and 2 false negatives over the 250 out-of-sample fitness cases. It
scores 98 hits.
/nrnnn \vr vvrr \ lAaFt'- uu! urr AIIF0 o
(values (IFGTZ (ORN (ORN (oRN (C?) (A?))
(oRN (L?) (v?) ) )
(onN (oRN (S?) (r?) )
(oRN (F?) (v?) ) ) )
Chapter 18
F
tH
.E 0.75
c)
L
h
O
In Sample
+ Out of Sample
GenJPatio'
Figure 18.L2 Comparison of values of in-sample and out-of-sample correlation for one run of
the arithmetic-performing version of the transmembrane problem
(* (% (* 4.768 6.557) (- 0.1"38 2.52s))
(* (Z 0.97I L.964) (+ -0.35 -5.054) ) )
(- (+ (" -1 .195 -3.919) (+ 8.518 0.052))
(+ (* -8.642 -9.429)
(_ _7.407 3.464) )))))
(progn ( looping-over-residues
(SET'M3 (+ (SET'M1 (+ (ADFO) M3))
(SETM2 (Z LEN Ml) ) ) ) )
(val-ues (+ M2 M2)) )) .
We have omitted the two branches defining ADF1 and ADF2 in the
above program because they are ignored by the iteration-performing
branch.
ADF0 tests whether the current residue is in the set {S, C, A,I,V,l, F}. S is
a neutral residue. C, A, L,V,l, or F are six of the seven hydrophobic residues
(according to the categories of table 18.1). Moreover, the omitted residue (M)
is one of the least hydrophobic. If the current residue is in the specified set,
the first argument of the rFGTZ evaluates to +1; otherwise, it evaluates to -1.
The then-part and else-part of this particular rFGTZ are both merely compositions of the arithmetic operations and the random constants and do not
contain any additional rFGTZs. The root rFGTZ returns either +4I.47 (the
result of evaluating the then-argument of the rFGTZ) or -54.91 (the result of
evaluating its else-argument of the IFGTZ). Specifically,
ifresidue € {SCALVIF}
Otherwise
hrterestingly, the-1.3?rattoof the two genetically evolved constants , +4I.47
and -54.91, in anrO is close to the -1.43 ratio of the sum of the seven KyteDoolittle values for all seven hydrophobic residues in table 1B.1 (i.e., +21.5)
and the sum of the Kyte-Doolittle values for all six neutral and all seven
hydrophilic residues (i.e., -30.8).
Predicition of Tiansmembrane Domains in Proteins
f +41.47 adf 0(r\ - I
l-s4.e1
491,
The iteration-performing branch uses M1, M2, and M3 to pcrform a somewhat complicated calculation which canbe rewritten as
Mz(r) ? Mz(r - 1) + adf 0(r) +
and
len
adf 0(r) i Mz(r - 1)
Mr(r) +
len
adf 0(r) * Mr(r - 1)
The result-producing branch dodbles u2 . The doubling can be ignored since
the final classification decision is made by the wrapper based on whether the
result-producing branch is positive or not (rather than on the magnitude of
the value retumed). u2 is almost equal to the running sum M3. The difference
is that M2 lags l,t3 and that M2 does not contain the contribution of the last
term. The effect of this difference is not large since the value of the length,
LEN, of the protein segments in our fifuress cases averages 23.0 andsince alpO
retums numbers with larger magnitudes (+41..47 and -54.9L).
Figure T8.12 compares, by generation, the in-sample correlation and outof-sample correlation for this run. As can be seery out-of-sample correlation
peaks at 0.968 on generation 5, but in-sample correlation increases from generation to generation. Overfitting is occurring after generation 5.
1.8.1.1. SUMMARY FOR THE ARITHMETIC-PERFORMING VERSION
The error rate produced by the genetically evolved program for the arithmetic-performing version of the transmembrane problem is equal to the error
rate produced by the genetically evolved program for the subset-creating
version. Thus, regardless of the choice for the function set for the functiondefiningbranches, genetic prografiuning improved on the error rate reported
for the four algorithms for recognizingtransmembrane domains reviewed in
Weiss, Cohery and hrdurkhyu (1993).
Chapter 18
19 Prediction of Omega Loops in Proteins
This chapter uses genetic programming to evolve a program for predicting
whether a given protein segment is, or is not, an omega loop. As in chapter
18, there are set-creating and arithmetic-performing versions of the problem.
I9.I BACKGROUND ON OMEGA LOOPS
One possible way to approach the difficultproblem of predicting the .itertiary
structure of a protein from its primary structure (i.e., the protein folding problem) is to first solve the seemingly simpler problem of predicting the secondary structure of a protein from its primary structure. If one could accurately
predict a protein's secondary strucfure from its primary strucfure, one could
presumably change the representation of the problem of predicting the tertiury structure from the primary structure into a problem of predicting the
tertiary structure from the secondary structure. If one could then accurately
predict a protein's tertiary strucfure from its secondary strucfure, one would
presumably then have solved the protein folding problem.
Researchers have expended considerable effort on the problem of predicting the secondary structure of a protein from its primary structure both because
of the considerable interest in secondary structure in its own right and
because of this alluring idea of decomposing the more difficult tertiary-structure problem into the secondary-structure problem. Howeve4, since the secondary-structure problem is not even close to solutiory it remains to be seen
whether solving it will actually prove to be useful in solving the tertiarystructure problem.
The u-helices and B-strands are well defined, easily recognized, highly
regular structures composed of a periodic pattem of hydrogen bonding and
repeating dihedral angles along the backbone of the protein. haterestingly,
the existence of these two regular secondary structures was correctly hypothesized from chemical principles (Pauling and Corey L951,;Pauling, Corey, and
Branson 1951) yearsbefore the atomic structures of the firstproteins and their
u-helices and B-strands were actually observed (Kendrew 1958).
The cr-helices and B-strands (each accounting for about a quarter of all the
amino acid residues of a typical protein) represent an important component
of protein structure. Prediction of secondary structures has traditionally
involved the classification of continuous subsequences of the amino acid
residues of a protein into three categories: u-helices, p-strands, and "other."
This "ather" category is also sometimes called ".random coil," a particularly confusing misnomer since the residues are neither random nor necessarily coiled (Leszcynski and Rose 1986). What is worse, this very large
"other" category does not contain just one kind of structure. Nonetheless,
since the structures that are contained in this " otlner" category are not as
regulaq, symmetric, well-defined, or easily recognized as the o-helices and
the B-strand, this three-way classification continues to receive considerable research attention.
A variety of technological approaches have been applied to the problem of predicting the secondary structure of a protein from its primary
sequence of residues (Chou and Fasm an 197 4a, L97 4b ; Doolittle 1987 ;Lesk
1,988; Argos 7989; Bell and Marr 1.990; Doolittle 1990; Fasman 1990;
Holbrook, Muskal and Kim t990; Muskal, Holbrook, and Kim t990;
Lapedes et al. 1990; Muskal and Kim 1,992; Wu et al. 7992; Hunter 1993;
Hunter, Searls, and ShavILk1993;Zhang et al. 1993). However, up to the
end of 1993, these various predicting algorithms have had an accuracy of
no better than about 65% and a correlation of no better than about 0.40 for
the usual three-way classification.
Many reasons have been suggested as to why all of these efforts have not
been highly successful. One explanation is simply that the problem is difficult and the right approachhas yet tobe found. Another possible explanation
is that global considerations of tertiary structure have such a major effect on
the formation of the secondury structure that it may not be possible to solve
the secondary-structure problem without simultaneously solving the tertiarystructure problem. Yet another explanation is that the wrong question may be
being asked in connection with secondary structure. Specificatly, the usual
three categories (and specifically the lack of refinement in the definition of
the "other" category) may be the wrong targets for this prediction problem.
Thus, several researchers have attempted to substitute entirely different
categories for the usual three, or to subdivide the "ofher" category into its
disparate components.
For example, Zhang et al. (1993) found that when self-organizing clustering techniques using neural networks are employed to permit the residues to
orgarrize themselves into categories, six categories, rather than the usual three,
emerge. These six structural building blocks may be better targets than the
usual three categories.
Leszcyrski and Rose (1986) have pursued the idea of devising different
categories by subdividing the "other" category using a novel category called
the omega loop. Omega loops account for about a quarter of the residues of
all proteins. The omega loop is therefore a potentially significant and useful
category. There would then presumably be four major categories: a-helices,
p-strands, omega loops, and a new category of "other" representing the
remaining quarter of the residues. However, the omega loop is not as
494 Chapter 19
attractive as the o-helix and B-strand because it does not have a regular structure and because its definition is derivative.
An omega loop is a non-regular secondary strucfure that consists of a relatively short continuous segment of the main chain of the protein in threedimensional space that resembles the Greek letter Q (when viewed from the
right direction). An omega loop is defined in terms of segment length, distance between the ends of the segment, and the absence of other regular secondary structure (i.e., u-helices and B-strands). Specifically, the segmentlength
must be between six and L6 residues. The lower limit of six on segment length
eliminates the inverse trrn, a known strucfure. The upper limit eliminates
most compound loops. The end-to-end distance between the cr-carbons of
the first and the last residue of the segment must be within 10 A. In additioru
this end-to-end distance may not exceed two-thirds of the maximum distance between any two s-carbons within the segment. The absence of other
secondary structures isbased on the Kabsch and Sander dictionary of protein
secondary structure (Kabsch and Sander 1-983), thus linking the definition of
the omega loop to the absence of cr-helices and B-strands.
The residues constituting an omega loop are usually exposed to a watery
milieu and consequently have a tendency to be hydrophilic or neutral. Since
residues ly*g on the surface of a folded protein also face a watery milieu, a
tendency toward hydrophobicity and neutrality is not uniquely associated
with omega loops.
Figure 19.1, shows the primary sequence of cobra neurotoxin venom (identified as 1CTX in the Apnl,L993 release of the Brookhaven Protein Data Bank).
The 1,0 residues C,D, A,F,C, S, l, R, G, and K between positions 26 and 35 of
this 7l-residue protein form an omega loop and are boxed.
Figure 19.2 shows the general structure of cobraneurotoxinvenom (1CTX)
highlighting the omega loop at residues 26-35. The a-carbon of residue 26 at
the beginning of the omega loop and the cr-carbon of residue 35 at the end of
the omega loop are within 5.321, A of each other; the backbone of the intervening residues resembles the Greek letter Q and protrudes away from the
mainbody of the protein and into the surrounding solvent.
r9.2 PREPARATORY STEPS WITH ADFs
We now apply genetic programming to the problem of predicting whether a
given protein segment is an omega loop. Based on our experience without
automatically defined functions for the transmembrane problem (chapter L 8),
we only consider the case in this chapter where automatically defined functions are being used.
IRCFITPDITSKDCPNGHu."'o'@KRVDLGCAATCPTVKT
GVDIQCCSTD NCNPFPTRKR P
Figure 19.1 Primary sequence of cobra neurotoxin venom ICTX.
Prediction of Omega Loops in Problems
5 0
495
1 L
Figure 19.2 General structure of cobra neurotoxin venom (ICTX) highlighting an omega loop
at residues 2G35.
Except for differences in the fibress cases and the success predicate, the
tableaux for the subset-creating version of the transmembrane problem (tables
L8.6 and 18.8) and the partial tableau for the arithmetic-performing version
(table 18.14) apply to the omega-loop problem.
Note that the function set and terminal set used ignore the distance criterion that is part of the definition of an omega loop. Because of this limitation,
we do not expect to get a satisfactory solution to this problem.
Leszcynski and Rose (1986) listed 270 omega loops among 6L proteins. Since
1986, the Brookhaven Protein Data Bank has become available on CD-ROM.
Howeveq, the exact versions of only 45 of the 67 proteins cited in the 1986
article are present in the current April 1993 release of the Brookhaven Protein
Data Bank. Weincluded allof theseversions of proteinsto create ourin-sample
set of fitness cases; howeveq, we exclu ded. L7 proteins with anomalies (e.g.,
gaps in nurnbering, ambiguous residues, nonstandard residues, etc.). Negative cases were created, where possible, from the 28 non-excluded proteins
by randomly choosing equally long segments not contained in any of the
protein's omega loops.
496 Chapter 19
Thble 1,9.1 In-sample fihress cases for the omega-loop problem.
PDB
protein
code
Chain Length
of
protein
Number
of omega
loops
Locations of omega loops
(positive fitness cases)
351C
lABP
2ACT
LBP2
zBP2
2C2C
3CNA
3CPA
lCRN
lCTX
lCYC
lECD
3FXN
lHIP
1LH1
lMBN
lMBS
lNXB
2PAB
lPCY
3PGM
lREI
lRHD
lRNS
lSBT
2SNS
2SSI
3TLN
B
B
82
306
46
71.
103
136
138
85
153
153
153
62
123
99
230
107
293
L04
275
2L8
123
130
r72
237
307
74r
113
31,6
2
6
8
a
J
3
4
8
7
1.
2
4
2
1,
4
2
1
3
L
1,
4
6
1
9
2
7
3
1
11
1.6-26,51,-62
93-99, 142-149, 203-208, 236-249,
289-294,299-304
8-L3, 58-64, 89 -1,03, 139 -1M,
t4r-156, Ig2-192, 198-205, 203-209
23-30,25-39,56-66
30-37,32-46,68-75
78-33, 30-43, 4L-56, 7 4-89
13-21., 97 -1.04, 1L6-123, L47 -r55,
t60 -165, 199 -209, 222-235, 229 -237
L28-1.41., 1M-156, 1.56-1.66, 205-213,
231.-237, 2M-250, 272-285
33-M
1-75,26-35
18-32, 30-43, 40-54, 7 0-84
33-42,41-49
54-6r
20-26, 28-4L, 43-49, M-59
41-53,47-54
40-47
37-50,49-54,79-84
6-L3
49-54
6-13, 41.-56, 63-68, U-92
11,-25, 98-1.09, 109 -120, 123-130,
132-1.45,209-224
91-96
34-43, 43-57, 60-73, 85-90, gg-105,
I85 -191., 193-199, 276-223, 294-293
"16-2L,67-76
17 -22, 37 -M, 7 4-86, g 6-'1.01, 157 -1.64,
181,-187,257-266
43-52, 11+119, 136-1.41
19-25
24-31, 32-39, M-53, 55-7 0, 9'1.-97,
125-130, 1 88-203, 204-213, 214-219,
221-233,248-255
Prediction of Omega Loops in Problems
Thble L9.2 Out-of-sample fitness cases for the omega-loop problem.
PDB
protein
code
Locations of omega loops
(positive fitness cases)
1.6-26,51,-62
22-29,48-55,94-96
93-99, 1,42-1.48, 203-208, 236-249,
289-294,299-304
8-13, 58-64, gg -103, 139 -1,M, 1,41-
156, L82-r92, lgg-205, 203-209
1.4-21., 100-112, 115-122, 122-129,
282-287
134-1,43
42-56, L3A-137,'1.41.-15'1., 1U-192
8-L4, U-45, 66-7 1., 72-82, 83-gL,
111.-117
32-47
23-30,25-39,56-66
23-30,25-39,61,-68
18-33, 30-43, 4L-56, 7 4-89
73-21., 97 -104, 176-123, 1.47 -155,
'J.60 -1.65, 199 -209, 222-235,
229-237
128 -1. 41., 1 42-15 6, 156 -"1. 66,
205-213, 231-237, 244-250,
272-285
19-24,65-79
33-44
1-15,26-35
18-32, 30-43, 40-54, 7 0-94
19 -33, 35-M, 4l-55, 7 1,-95
19 -33, 35-44, 4l-55, 71.-95
33-42,47-49
72-77,99-105,132-140
12-23,30-41.,39-50
54-61.
20-26, 28-41., 43-49, M-59
41-53,47-54
46-59,55-64
18-25, 36-42, M-52, 60-7 5
134-139
40-47
37-50,49-54,78-84
39-54,47-57
351C
155C
lABP
2ACT
SADH
3ADK
3APP
lAZU
3B5C
1BP2
zBP2
2C2C
3CNA
3CPA
5CPV
lCRN
lCTX
lCYC
3CYT
3CYT
lECD
3FAB
lFDX
3FXN
lHIP
1LH1
2LHB
TLYZ
TLZNI
lMBN
lMBS
2MHB
82
135
306
2r8
374
195
323
127
88
123
130
7L2
237
1.09
46
7L
103
104
104
136
220
54
138
85
153
149
129
1,62
153
153
146
2
a
J
6
1
3
3
4
8
1
4
6
2
T
2
4
4
4
2
J
J
L
4
2
2
4
1,
1
J
2
I
o
498 Chapter L9
PDB
protein
code
2MHB
lNXB
2PAB
9PAP
lPCY
3PGM
lREI
l.REI
lRHD
lRNS
lSBT
2SNS
2SOD
2SOD
2SOD
2SOD
2SSI
3TLN
Locations of omega loops
(positive fitress cases)
40-48
6-13
4045
8-r3, 60-67, g6-100, L3g-153,
17 5 -195, 19 L -19 g, 198 -203
G13,41-56,63-69,9+92
IL-25 , 98-109 , 109-120, 123-'t 30,
\32-'J.45,209-224
91-96
91-96
34-43, 43-57, 60-73, g5-90, gg-105,
L 85-1 91, 193-L99, 2'J,6-223,
2U-293
3641'97-96
17 -22, 37 -M, 7 4-96, g 6-101,
L57 -164, 18L-197, 257 -266
43-52,114-119, L3G141
51.-59, 6g-79 , 10+110, L23-139,
133-138
51.-59, 6g-79, 104-II0, 123-L39,
133-138
5L-59, 5g-79, L04.L10, L23-139,
133-138
5't -59, 6g-79,'104-110, 123-1.39,
133-138
19-25
24-31, 32-38, M-53, 55-7 0, gr-97,
125-130, 1 8&203, 204-213,
21,+219, 221.-233, 249-255
Thble 19.1 shows the 28 proteins from which the in-sample set of fibress
cases was created. A total of Zl?fitrness cases was created from these 28 proteins (107 positive and 105 negative).
In every case where the exact version of a protein cited in the 1986 article
was not present in the April 1993 rclease of the Brookhaven Protein Data
Bank, a later version of the s€une protein was present (typically because the
earlier study was replaced with a later study with better resolution). We used
the most recent version of such proteins to create the out-of-sample set of
fitness cases. In each instance, we created a three-dimensional kinemage of
each of the newer versions using the PREKIN software (Richardson and
Richardsont992) and visually verified that the residue numbers cited in the
1986 article still corresponded to an apparent omega loop. We excluded any
Prediction of Omega Loops in Problems
141,
62
r1,4
212
99
230
107
107
293
104
275
1,41,
152
r52
152
152
113
3t6
499
omega loop where this was not the case. Negative cases were created in the
sarne m€ulner as described above.
Table 19.2 shows the 44 proteins from which the out-of-sample set of fitness cases were created. A total of 356 fitness cases were created from these
proteins (181 positive and 175 negative).
1.9.3 RESULTS FOR THE SUBSET:CREATING VERSION WITH ADFS
Except for the fitness cases and the success predicate, the tableaux for the
subset-creating version of the transmembrane problem (tables 18.6 and 18.8)
apply to this version of the omega-loop problem.
Figure 1,9.3 shows the fitness curves for one run of the subset-creating
version of the omega-loop problem. At generation 0, the in-sample correlation of the best-of-generation program is 0.302 and the standardized fitness is 0.349.
The best-of-generation predicting program from generation lL has an insample correlation of 0.52 as a result of getting 75 true positives, 86 true negatives, 19 false positives, and 32 false negatives over the 2L2 in-sample fitness
cases.IA/hen tested onthe out-of-sample set, this programhas anout-of-sample
correlation of 0.57 as a result of gettingl7 truepositives,55 true negatives, 15
false positives, and 16 false negatives over the 143 out-of-sample fitness cases.
It scores 78 hits.
(progn (defun ADFO o
(values (ORN (oRN (ORN (A?) (I?)) (oRN (V?) (a?)))
(oRN (M?) (r?)))))
(defun ADF1 o
(values (ORN (H?) (T?))))
(defun ADF2 o
(values (oRN (oRN (N?) (W?) ) (oRN (I?) (W?) )) ) )
(progn (looping-over-residues
(sErMO (- (sErMl M0) 1elr0) ) ) )
(values (+ (IFLTE M0 (+ LEN -5.805)
(* (IFLTE MO LEN LEN MO)
(rFLrE LEN s.006 2 .078 M3) )
(TFLTE M1 LEN M3 M3) )
(+ (IFLTE M2 M3 M2 M2)
(+ (- (* (+ (IFLTE (+ 5.17 M1)
LEN M3 LEN) M1-)
(z -4.02 M2) )
(- 5.654 LEN) )
M1)))))).
This program can be simplified to the program shown below. Since ADF1
and anr'2 are ignored, they are deleted below. The two-argtrment numerically
valued disjunctive operator ORN is replaced by oRN* that takes multiple
arguments.
500 Chapter 19
1.0
CN
a
o
r-. -
E o.s
N
,
a
0.0
o
c.rr.1Jutior,
50
Figure 19.3 Fitress curves for one run of the subset-creating version of the omega-loop problem.
/nrm /daFrrn ADFQ
zs! v o\ /
(values (oRN* (A?) (I?) (V?) (Q?) (M?))))
(progn ( looping-over-residues
(SETMO (- (SETMI MO) (ADFO))))
(values (+ (IFLTE M0 (+ LEN -5.805)
(* LFN (rFLrE LEN 5.006 2.078 0))
0 )
(IFLTE (+ 5.17 M1) LEN 0 LEN)
-5.654
LEN
(* 2 Ml))))) .
Lr this program ADF0 retums +L when the current residue is A, I, V, Q, or
M and to -L otherwise.
The iteration-performing branch uses memory cel M0 to compute a negative running sum of the values returned by ADFO. For residues 26-35 of cobra
neurotoxin venom (1CTX), namely C,D, A,F,C,S, l, R, G, and K, anFO evaluates to +1 two times out of L0 and evaluates to -L eight times out of ten. Thus,
M0 equals +8. Memory cell Ml- lags the accumulated value of memory cell
M0 by one residue so ttll is the negative running sum of all but the last value
retumed by ADFO. The result is that M1 equals +7.
The result-producingbranch computes the five-term sum shown above
in which the first two terms are rFLTEs. This expression returns a positive value so the wrapper classifies this protein segment as being an omega
loop.
Figure 1"9.4 compares, by generation, the in-sample correlation and out-ofsample correlation for this run. As can be seery out-of-sample correlation peaks
at0.57 for generations 11.-"l.4,but in-sample correlation increases relentlessly
over the entire range of generations. After generationL{,the evolved predicting programs are not getting better at predicting and overfitting is occurring.
Instead, they are being fitted more and more to the idioslmcrasies of the
Worst of Generation
+ Avemge
{- Best of Generation.
501 Prediction of Omega Loops in Problems
a
+f
.E 0.4 q)
l
ti
O
In Sample
+ Out of Sample
o G.n#atior, so
Figure 19.4 Comparison of values of in-sample and out-of-sample correlation for one run of
the subset-creating version of the omega-loop problem
Worst of Generation
Average
+ Best of Ceneration.
o
c.nJlutior,
20
Figure 19.5 Fitness curves for one run of the arithmetic-performing version of the omega-loop
problem.
particular available in-sample fitness cases. \zVhen this run was extended to
generation 50, the peak at generations 11-1,4 continued to be the high point
for out-of-sample correlation.
19.4 RESULTS FOR THE ARITHMETIC.PERFORMING VERSION WITH
ADFs
The arithmetic-performing version of the omega-loop problem is similar to
the arithmetic-performing version of the transmembrane problem in that the
function-defining branch is changed so that arithmetic and conditional
operations c€ilI be performed.
Except for the fitness cases and the success predicate, the original tableaux
for the subset-creating version of the transmembrane problem (tables 18.6
Chapter L9
0.8
(a(nq)
€
-{
R o.s
ti
Ft
-
a
0.2
502
and 18.8)/ as modified by the partial tableau for the arithmetic-performing
version of the transmembrane problem (table 18.14), apply to the arithmeticperforming version of the omega-loop problem.
Figure 19.5 shows the fitness curves for one run. At generation 0, the
in-sample correlation of the best-of-generation program is 0.30 and the
standardized fitness is 0.35.
The best of generation 1"4 has an in-sample correlation of 0.453 resulting
from getting 78 true positives ,T6truenegatives ,29 falsepositive, md29 false
negatives over the 212 in-sample fitness cases. Its out-of-sample correlation
of 0.M9 is the result of getting 134 true positives,Iz|true negatives,5L false
positives, artd 47 false negatives over the 356 out-of-sample fibress cases. It
scores 72hits.
t^-^-- t,7^F,.".
\trJrvvlr \uErL,LJ-1 ADF0 o
(values (IFGTZ (H? ) 4 .I72 I.591_ ) ) )
(defun ADFI- o
(values (IFGT? (oRN (oRN (R?) (x211 (oRN (N?) (K?) ) )
(- (- -8.842 5.865) (% (% -6.399 3 .942) (% -5.531_
8.623))) (rFcrz (oRN (E?) (F?)) (- (x -4.L] 4.843)
6.434) (% 0.798004 4.244)\\\)
(defun ADF2 o
(values (IFGTZ (oRN (oRN (M?) (I?)) (V?)) (+ (% ( +
L.725 0.0520003) (- 8.183 t.476) ) (% -8.943 7.31_6))
(% (u -8.943 7.316) (- -7.3e3 2.1_83)))))
(progn ( looping-over-residues
(z (sErMl (+ M1 (ADF2)))
(+ (* M2 M0) (* -2.037 LEN) ) ) )
(values (IFLTE (- (IFLTE (- (IFLTE 6.061 M2 M 0
Ml) (* M1 L .677 | | (% (Z M0 M0 ) (rFLrE M3 M 3
-7.51- 0.0160007)) (>" (- l,tO u1) (+ M 1
5.334)) ('( (TFLTE LEN 6.77L 4.685 -2.358) ( +
M3 LEN) )) (* M1 I.677)) (Z (% M0 M0) (* 9.9I
M3)) (Z (- M0 M1) (+ M1 -5.334)) (* (rFLrE
LEN 5.17L 4.685 -2.358) (+ M3 LEN) ))))).
In this Program ADFO and anr'1 are ignoredby the iteration-performing
branch. M0, M2, and pt3 all remain at zero throughout.
ADF2 retums thenumericalvalueof 4.9777 if thecurrentresidue is M,l, or
V (all of which are hydrophobic) but otherwise retums +0.1277.
The iteration-performingbranch, teeo, uses Ml- to create a running sum of
the numerical values retume dby eDF2. Note that the subsequent division by
( -2.037 LEN) plays no role in the accumulated values in ut.
The operation of the result-producing branch, RpB, can be illustrated with
residues 2G35 of cobra neurotoxin venom (lCTX), namely C,D,A, F, C, S, l,
R, G, and K. There is only one I among these 10 residues so M1- acquires the
value of 0.1712 in the iteration-performing branch, rpBo. The result-producing branctr, Ree, then evaluates to 0.0332. Because this value is positive, the
wrapper classifies the protein segment as an omega loop.
503 Prediction of Omega Loops in Problems
t 0.4
*i
q)
L
fr
6o,
504
In Sample
+ Out of Sample
cenL?ation
Figure 19.6 Comparison of values of in-sample and out-of-sample correlation for one run of
the arithmetic-performing version of the omegaloop problem.
Had there been two occurrences, of M, l, or V among the L0 residues
(instead of one), M1 would have been 4.9UI and the result-producing branch,
RpB, would have evaluated to 4.1,490. Because this value is negative, the
wrapper would have classified the protein segment as an omega loop.
Figure 19.6 compares, by generation, the in-sample correlation and out-ofsample correlation for this run. As canbe seen, out-of-sample correlationpeaks
al0.M9 for generations 14 arrd 15, although in-sample correlation increases
monotonically over the entire range of generations.
19.5 SUMMARY FOR OF THE OMEGA-IOOP PROBLEM
The in-sample correlation of 0.52 for the subset-creating version of the omegaloop problems corresponds to an accuracy of 76"/" and the out-of-sample correlation of 0.57 corresponds to an accuracy of 78%. We are not aware of any
published efforts at predicting omega loops using machine leaming techniques; however, as previously mentioned, efforts at predicting other secondary structures (employing different techniques and a distinctly
non-comparable statement of the predicting problem) have yielded values of
out-of-sample correlation and accuracy that are somewhat smallet but in the
same general neighborhood as 76% arrd79%.
For the arithmetic-performing version of the omega-loop problem, the
in-sample correlation of 0.458 corresponds to an accuracy of 73% andthe outof-sample correlationof 0.476 corresponds to an accuracy of 74%. Thus, for
the small number of runs involved here, the arithmetic-performing version
did somewhat less well than the subset-creating version.
Chapter 19
20 Lookahead Version of the Tkansmembrane
Problem
This chapter considers a more difficult version of the transmembrane
problem (first considered in chapter 18).
20.'1, THE PROBLEM
The version of the transmembrane problem of section 18.2 corresponds to the
first experiment reported in Weiss, Cohen, and lrdurkhya 1993 nwhich the
protein sequence is pre-parsed into segments such that each segment is either
an entire transmembrane domain or an entirely non-membrane area of the
sequence. The goal of the classi$ring program is to classify the entire segment
into one of the two classes.
Lr the third and most difficult experiment of Weiss, Cohen, and hrdurkhya
(I993),the entire protein sequence is presented and the goal of the classifying
program is to classif eachindividual residue of the sequence as to whether it
belongs to a transmembrane domain or to a non-transmembrane area of the
protein.
Transmembrane domains contain mostly hydrophobic residues and nontransmembrane areas contain mostly hydrophilic residues; howeveq, transmembrane domains and non-transmembrane areas cannot be recognuedby
examining any single residue. There are many hydrophobic residues in nontransmembrane areas and many hydrophilic residues in transmembrane
domains. Thus, the classification of an individual residue requires a calculation based on the characteristics of the neighborhood of the residue. Howeve4 one cannot compute the characteristics of a neighborhood until one
knows where the neighborhood begins and ends. The beginning and the end
of a neighborhood is determined by the fact that the niembers of a neighborhood share the overall characteristics of the neighborhood while the members of the adjacent neighborhoods share different overall characteristics. Yet
one cannot determine the characteristics of a neighborhood, the previous
neighborhood, or the next neighborhood until one knows where the neighborhood itself begins and ends.
It is necessary to Parse the entire protein sequence in order to resolve this
problem. Moreovel, it is necessary to tentatively parse at least part of the
506
sequence in order to decide where actually to parse the sequence (i.e.,
lookahead or backtracking is required in some form). After the parsing is
successfully done, all the individual residues in each now-identified neighborhood are classified uniformlv.
J
20.2 PARTIAL PARSING
Since we are attempting to identify two kinds of areas of the protein, it seems
reasonable that the overall program for this version of the transmembrane
problem should contain two separate running calculations in order to recognize the end of each kind of area of the protein and the beginning of the next.
The difficulty of the parsing version of this problem can be appreciated if
one imagines scanning the residues (starting from the first residue at the
N-terminal end of the protein) and attempting to perform one or two running calculations based on the hydrophobicity,neutrality, and hydrophilicity
of the residues. IA/hat do such calculations actually do? How many residues
should be included in the window for performing each calculation? How do
we deal with the fact that each protein being considered has a different total
number of residues? How do we deal with the fact that we do not know the
total number of transmembrane domains, if arty, in the protein? How do we
deal with our lack of foreknowledge of the number of residues in each transmembrane domain? And each non-transmembrane arca? \A/hat is the condition for deciding that one kind of area has ended and the other kind of area
has started? How do we identify the boundury between a first area and a
second areat if the boundary between the areas is defined by a change in the
value of some calculation that depends on information about residues outside and beyond the first arca?
Parsing calls for nested iterations. We can tolerate an iteration, multiple
iterations, or even nested iterations in a genetically evolved program provided there is a cap on the computer time available to any one Program. In
both the set-creating and arithmetic-performing versions of the transmembrane problem in chapter 18, the cap was ensured by not permitting evolutionary modification of the termination predicate of the iteration. This was
accomplished by restricting the iteration to one iterative Pass over the finite,
known set of residues of each protein segment. The parsing required by this
newversion of the fransmembrane problem requires multiple iterative calculations at points that cannot be specified in advance. That is, we must evolve
termination predicates while we evolve various iterative calculations. Because
this problem involves a finite, knor,rm sequence/ we can continue to enjoy the
benefits of an overall cap on computer time while evolving various termination predicates and various iterative calculations if we again restrict all activity of the evolved overall program to a single pass over the residues of the
protein sequence.
Figure 20.L shows the various kinds of branches that might aPpear in an
overall computer program to solve this problem. One function-defining branch
is shown on the left side of the figure, but, in general, there could be many
Chapter 20
Figure 20.1 Hypothetical six-branch overall program.
Figure 20.2. Four-branch overall program actually used in the lookahead version of the transmembrane problem.
such branches or none. Then, two pairs of branches are shown in the middle
of the figure. Each pair contains one iteration-performing branch and one
ituation-tuminatingbranch.Frnally,one resultproducingbranch, RpB, is shor,rm
on the right side of the figure; howevet this final branch may possibly be
deleted. h. uny event, the bodies of all branches are subject to evolution.
We will actually use only a subset of these six branches in our approach to
the lookahead version of the transmembrane problem. Figure 20.2 shows the
four branches actually used; there are no explicit function-defining branches
and there is no separate result-producing branch.
An overallbehavior called loop-until-end-of -protein govems the
execution of the four branches.
\A/hen the two groups of two iterative branches are executed, a pointer
identifying the current residue of the protein is started at the first residue at
theN-ternrinal (i.e., thebegiruringof theproteinsequence). Controlthenpasses
to a behavior called looping-over-residues that encompasses the first
pair of branches, r PB O and r re O .The first of the four iterative branches is an
iteration-performing branch, rPBO, and the second is an iteration-terminating branch, rteo. Lritially, the body of iteration-performing branckg reeo, is
executed. Each time the body of rpeo is executed, the pointer identi$ring the
current residue is advanced one residue along the protein toward the
C-terminal (i.e., the end of the protein sequence). Then the body of iterationterminating branch, rrB0, is executed. An iteration-terminating branch is
deemed to be satisfied when it retums a numerical value greater than 0. The
Process of executing the pair, rrB0 and rpBO, is iterated until rTBO becomes
Lookahead Version of the Transmembrane problem
Body of Iteration - Body of Iteration
Terminating Branr
508
satisfied (if ever). \Aflhen rTB0 is satisfied, control passes to the second pair of
branches, rPBl and tte1.
The third of the four iterative branches is a second iteration-performing
branch, r PB1, and the fourth branch is a second iteration-terminating branch,
rTB1. Lritially, the body of iteration-performing branch, IpB1, is executed.
Each time thebody of rpel is executed, the pointer is advanced one residue.
Then rrBl is executed. If rref is not satisfied, then TTB1 and rpet are iteratively executed until rTBl becomes satisfied (if ever).
Since the number of transmembrane domains will vary among the differentproteins representedby the set of fitness cases and isnot knoumin advance
for any particular protein, these fourbranches are placed inside an outer loop.
This outer loop permits toggling between the two pairs of branches. Thus,
when iteration-terminatingbranch, rrB1, is satisfied (if ever), control passes
back to the first pair of branches, rPBO and lteO.
At the instant when the pointer identif,iing the current residue reaches the
C-terminal of the proteirg the iteration is terminated. Thus, regardless of the
imperfections of the two iteration-terminating branches (rreO and rrnl),
execution of the overall program is always limited to exactly one pass over
the finite, known set of residues of the protein sequence.
The four iterative branches are embedded in a wrapper that makes the
final decision as to whether to classify an individual residue as belonging to a
transmembrane domain or to a non-transmembrelne area. The wrapper toggles
in slmchrony with the satisfaction of the termination predicates of the iteration-terminating branches, rrB0 and rtel. Specifically, the wrapper begins
by classifying all residues as being in a non-transmembrane area until the
first iteration-terminatingbranch, rTB0, is satisfied. The wrapper then classifies all residues as being in a transmembrane domain until the second iteration-terminatingbranch, rrB1, is satisfied. The wrapper then toggles back to
a non-transmembrane classification.
Let us consider some of the possibilities that may occur. If the first iteration-terminating branch, ITB0, does not cause termination before the entire
protein is examined, the second pair of iterative branches (reer and rter)
will never be executed. If the second iteration-terminating brancfu r TB 1, does
not cause terminationbefore the entire protein is fully examined, execution of
the overall program will be limited to only one completed series of executions of the pair, IpBO and ITB0, and only one interrupted series of executions of the pai1, IPB1 and rte1. Or the other hand, if ttel causes termination
before the entire protein is fully examined, then a second series of executions
of the pair, Ipeo and TTBO, will be possible. Similarly, if rteO causes the
termination of the second series of executions of the pair, TPBO and rTeO,
before the entire protein is ful$ examined, then a second series of executions
of the pafu, tpBl and rrB1, will be possible. If the entire protein is still not
fully examined, both the paiq, I PB 0 and rte 0, and the pai1, I PB 1- and ITe 1,
will have additional opportunities to be executed.
The following code employing the LooP macro of Common LISP (Steele
1990) specifies the overall loop-until-end-of -protein behavior and
Chapter 20
the looping-over-residues behavior for one protein for the lookahead
version of the transmembrane problem.
I (loop with residue-index = 0
2 until (>= residue-index (length protein-sequence)
3 do (loop initially (progn (setf M0 0.0) (setf Ml 0.0)
4 (setf M2 0.0) (seLf M3 0.0) )
5 for res from residue-index
5 below (length protein-sequence)
7 for residue = (aref protein-sequence res)
B do (eval rPB0)
9 until (> (eval rTB0) 0.0)
10 finally (progn (mark-as-non-transmembrane
1-1 residue-index res)
L2 (setf residue-index res) ) )
13 (loop initially (progn (serf J0 0.0) (serf Jl 0.0)
14 (setf J20.0) (setf ,J30.0))
15 for res from residue-index
16 below (length pro[ein-sequence)
17 for residue = (aref protein-secruence res)
18 do (eval IPBI)
L9 until (> (eval IPTI) 0.0)
20 €in=l1rr /nr^^1 (mark-as-transmembrane
2I residue-index res)
22 (setf residue-index res) ) )
23 finally (return (wrapper (compute-correfation) ) ) ) .
The outer loop starts on line 1 with residue-ind.ex initialized to 0 and
runs until residue-index is determined on line 2 to equal or exceed the
Iength of the protein-sequence. The scope of this outer loop goes until
the f ina 1 ly clause on line 23. The body of this outer loop contains two similar inner loops running between lines T12 andIS-22 (the first being associated with areas that will become marked as non-transmembrane areas and
the second associated with areas that will become marked as transmembrane
domains).
The first inner loop running between lines 3-l2initializes four settable variables, M0, M1, M2, and M3 to 0 on lines 3 and 4.
Line 5 starts a local loop counter called res that starts at the current residue (i.e., residue-index) and terminates the loop if res reaches the end of
the protein.
On line 7 the residue whose index is residue-index is extracted by the
array-referencing function aref from the array protein-sequence and
assigned to resid.ue.
Line 8 evaluates the iteration-performing branch rpBo until the result of
evaluating the iteration-terminating branch, rrBO, on line 9 is greater than 0.
The whole loop iterates the evaluation of rpeo and the testing of rreo until
either is satisfied (i.e., becomes greater than zerc) or the loop runs off the end
of the protein.
509 Lookahead Version of the tansmembrane problem
510
The f inally clause of the first inner loop uses a progn to do fwo things
on lines LG12. Lines 10 and 11 cause the entire area to be marked as being a
non-transmembrane area. These identifications are retained in a hidden vector of length (length protein-segment ) and are compared inline 23 to
the correct classifications for each residue of the protein. Line 12 ends the first
inner loop; it sets residue-index to the value of res on which the first
inner loop ends.
The second inner loop running between lines L3+2 operates in a similar
memnet except that the settable variables are.TO, J7, J2, and .r:; IPBl and
rTBl are the iteration-performing and iteration-terminating branches; and
the delineated area is marked as a transmembrane domain, instead of a nontransmembrane area.
The f ina 1 ly clause of the outer loop on line 23 causes the areas marked as
being non-transmembrane areas on lines 10 and L1 and the areas marked as
being transmembrane domains on lines 20 and 2ltobe passed to the fihress
calculation. The c ompu t e - c o r r e 1 a t i on calculation computes the correlation between the idenffication made by the individual program for each residue of the protein and the correct classification of the residue. The wrapper
then converts the correlation into the standardized fihress of the individual
Program.
If the calculation performed by the first iteration-performing branckU rPB 0,
evolves in such away as to become relevant to recognizing a non-transmembrane atea, and if the first iteration-terminating branch, rrB0, evolves to
become relevant to recognizing the end of a non-transmembrane area, and if
the second iteration-performingbranch, TPB1, evolves to become relevant to
recognizing a transmembrane domain, and if the second iteration-terminating branch, tre1, evolves to become relevant to recognizing the end of a
transmembrane domain, then the overall program will be able to identify an
arbitrary number of transmembrane domains within a protein sequence and
correctly classify each residue in the protein sequence.
Presumably, the iteration-performing branches, I PB 0 and t pe 1, make some
calculation based on the characteristics of the area involved. Howevet recognizing theboundaryfor one kind of area requires recognizingthatthe characteristics of the area just beyond the yet-to-be-determined boundary are the
characteristics of the other kind of area.
True parsing requires nested iterations or recursion. Nested iterations would
have entailed more computer time than was available. We therefore comPromised by using a form of partial parsing. This partial parsing uses a oneargument lookahead function, called t OOt<, and the already-described
unnested iterative structure involving the fourbranches. The LOOK provides
a way to examine the characteristics of the part of the protein sequence just
beyond a yet-to-be-determined boundary. Execution of a LooK does not drange
the index of the current residue on which the overall program is operating.
Howeveq, if the argument to a LooK contains a residue-examining function,
the residue referred to by that residue-examining function within the argument of the LooK is the residue one ahead of what it would otherwise be.
Chapter 20
Since the argument of a LooK may contain other LooKs, it is possible for a
program to look ahead as far as it likes (subject only to the limit on program
size and depth conunon to all programs in a run). The looking is, of course,
inherently biased in favor of relatively short lookaheads because it takes one
additional occurrence of the loox function in the program tree to achieve
each additional increment of lookahead. The argument to the LooK function
is executed after entering the LooK and is implemented as a macro in LISP.
20.3 PREPARATORY STEPS
The fitress cases for this version of the problem are based on the same mouse
transmembrane proteins as before; howeveq, in order to save computer time,
we include only proteins with either at least two transmembrane domains or
no more than 200 residues. The 47 quall$ng in-sample proteins contain a
totalof 22,98L residues (5,630 positive in-sample fitress cases and17,351 negative in-sample cases). That is, about a quarter of al1 residues of the qualifying
proteins are in transmembrane domains. The 38 qualifying out-of-sample
proteins contain a total of 17,158 residues (4,572 positive out-of-sample fitness cases and 12,586 negative out-of-sample cases).
The set of 22,98lin-sample fibress cases is very large in comparison to other
problems in this book and in comparison to the set-creating and arithmeticperforming versions of the transmembrane problem (chapter 18). We have
already demonstrated that genetic programrning is capable of evolving biochemically relevant subsets of amino acids using automatically defined functions (chapters L8 and 19). Since the lookahead version of the transmembrane
problem (with its huge number of fitness cases) is clearly going to be exceed"-
i.gly time-consuming, we decided to supply, on a silver platteq, the same sort
of detectors for hydrophobic and hydrophilic amino acids that were evolved
in previous problems. That is, we decided to directly insert functions in the
function set that ascertain whether the current residue belongs to certain subsets of amino acids. This approach has the advantage of concentrating the
available computer resources on the new and difficult aspect of partial parsing and lookahead, rather than on the already-demonstrated evolution of
detectors. However, even after eliminating the need to evolve the detectors, a
run of this problem to generation 20 requires a week of computer time.
The following five residue-detecting functions are used:
PHOBTC is a zero-argument function that refums +1 if the current residue
belongs to the hydrophobic category as defined in table 18.1 (i.e., it's either l,
V , L, F, C, M, or A) and retums -1 otherwise.
PHTLTC is a zero-argument function that retums +1 if the current residue
belongs to the hydrophilic categor (i.e., H, Q, N, E, D, K or R) and retums -1
othenrrise.
NEUTRAL is a zero-argument function that returns +1 if the current
residue belongs to the neutral category (i.e., G, T, S, W, y, or p) and returns
-1 otherwise.
511 Lookahead Version of the Transmembrane problem
512
VERY-PHOBTC is a zero-argument function that retums +L if the current
residue is one of the four residues with the highest numerical value of hydrophobicity on the Kyte-Doolittle hydrophobicity scale of table 18.L (i.e., l,V,L,
or F) and retums -L otherwise.
CHARGED is a zero-argument function that returns +1 if the current residue
is one of the four electrically charged residues (i.e., D, E, K, or R) and returns
-1 otherwise. These residues include the most hydrophilic residues in
table 18.1.
As usual, these five zero-argument functions are treated as terminals.
Because we are not evolving detectors for hydrophobicity and other categories of amino acids, the 20 zero-atgoment functions (A? ) ,
(C ? ) ,
... for
detecting a particular amino acid at the current residue are not used in this
version of the problem.
The terminal set, tipb7,for the first iteration-performing branch, r PB 0, contains the five zero-argurnent numerically-valued residue-classifying functions,
the settable variables M0, ML,M2, and M3, and the random constants. That is,
ttpuo= { (pHoBrc), (PHrLrc), (NEUTRAL), (CHARGED), (VERYPHOBIC ) , M0, M1, M2, M3, 9t6igger-reals).
The terminal set, tipbl,for the second iteration-performingbranch rPBl is
similar, but contains the settable variables J0, Jl-, J2, and J3.
trpul = { (pHoBrC), (pHr].rc), (NEUTRAL), (CHARGED), (VERYPHOBIC ) , JO, JL, J2, J3, 9t6igger-reals).
The function set, ffpby,for the first iteration-performing branch, I PB O, contains the four one-argument setting functions SETMO, SETMI, SETM2, and
SETM3, the lookahead function LooK (described in the previous section), the
conditional rFltg operator, the four arithmetic functions, and the numerically valued disjunctive function oRN.
frpuo= {snruO, SETMI, SETM2, SETM3, LooK, TFLTE, *, -, *, %, oRN}.
The function set, fipbl, for the second iteration-performing branctU IPBl,
is similat,but contains the four one-argument setting functions, SETJO, SETJI,
SETJ2, and SEr,l3.
frput = { snt,lO, sETJl-, SETJ2, SETJ3, LooK, TFLTE, *, -, *, %, oRN}.
The terminal set, tirby,for the first iteration-terminating branct! ITB0, contains the five zero-argomentnumerically-valued residue-classifying functions,
the settable variables, M0, M1,,M2,and M3, and the random constants. That is,
,1ftb0- { (pHoerc), (PHrLrc), (NEUTRAL), (CHARGED), (VERYPHOBIC ) , MO, M1, M2, M3,9t6igger-reals).
The terminal set, %tbl, for the second iteration-terminating branch, TTB1,
is similar, but contains the settable variables, JO, JL, '-12, and J3.
,Iitbl = { (PHOBTC), (PHrLrC), (NnUtRaL), (CHARGED), (VERYPHOBIC ) , J0, J1', J2, J3, lt6lgger-reals).
Chapter 20
Since the iteration-terminatingbranch uses the settable variables, but does
not change them, the function set, fitbT, for the first iteration-terminating
branch, rrBO, is the same as fipby,except for the deletion of the four setting
functions, SETMO, SETMI, SETM2, and SnfU3.
fitb7= {LOOK, IFLTE, *, -, *, %, ORN}.
The function set, fitb1, for the second iteration-terminating branch, rrB1,
is the same as fipbl,exceptfor the deletion of the four setting functions, SETJS,
SETJI, SETJ2, and Snr;3, so that
fitbl = fttuo.
There is no explicit result-producing branch in this problem.
The fibress measure and other aspects of the lookahead version of the transmembrane problem are the same as for the set-creating and arithmeticperforming versions of this problem (section 18.10).
Thble 20.1 summarizes the key features of the lookahead version of the
transmembrane problem.
20.4 RESULTS
We first examine the run that produced the best-of-all individual (called run
t herein).
As one would expect, the vast majority of the randomly generated programs in generation 0 have a correlation C of 0.0 indicating that they are no
better than random in predicting whether a residue belongs to a transmembrane domain. Many of these programs achieve their poor performance
because the result-producing branch returns the same value regardless of the
composition of the protein segment presented.
The best-of-generation predicting program from generation 0 of run L has
an in-sample correlation of 0.42and a standardized fitness of 0.29 as a result
of getting 3,201. true positives,14,723 true negatives, 2,948 false positives, and
2,109 false negatives over the22,987 in-sample fitness cases. \rVhen tested on
the out-of-sample set, this program has an out-of-sample correlation of 0.48
and an out-of-sample standardrzed fitness o10.26 as a result of getting2,853
true positives,\0,824true negatives, 1,792false positives, and1,,689 false negatives over the 17,158 out-of-sample fitness cases. It scores 74h1ts.
( I non-rrn1- i I -ond-nf -nrnf oi n \ +vvIJ yrvuufif
( looping- over- res idues
(SETMI (SETMO MO))
(+ (LOOK (VERY-PHOBIC) ) (- (VERY_PHOBIC)
( looping - over-res idues
(SETJl (SET,fO (CHARGED) ))
(z (* (pHoBrc) (NEUTRAL) ) (Z (pHOBrC) J2)
The first iteration-performing branch, rpBO, of
sets M0 and ut to MO's original value of 0.
M1-))
)).
this program repeatedly
Lookahead Version of the Tiansmembrane Problem
Table 20.1 Tableau for the lookahead transmembrane problem.
Objective: Find a program to classify each individual residue of a
protein sequence as to whether it lies in a transmembrane domain or a non-transmembrane area.
Architecture of the
overall program:
TWo iteration-performing branches (t leO, r PB1-) and
two iteration-terminating branches (t re 0, r rB l- ).
Parameters: Branch typi"g.
Terminal set for the
iteration-performing
branch rPBo:
(PHOBIC), (PHILTC), (NEUTRAL), (CHARGED),
(VERY-pHOBrC), M0, Mr,Mz,l't3, and the random
constants Sbigg"r-r"utr.
Terminal set for the
iteration-performing
bTANCN IPBI:
(PHOBIC), (PHILIC), (NEUTRAL), (CHARGED),
(VERY-PHOBIC), J0, JI, J2, J3, and the random
constants Sbigg"r-."ulr.
Function set for the
iteration-performing
branch rPBo:
SETMO, SETMI-, SETM2, SETM3, LOOK, TFLTE,
*, -, *, %, and oRN.
Function set for the
iteration-performing
t l
DTANCN I.PIJI:
SETJ0, SET,J1, SETJ2, SETJ3, LOOK, TFLTE, +, -
*,?, and oRN.
Terminal set for the
iteration-terminating
branch rrBo:
(PHOBIC), (PHILIC), (NEUTRAL), (CHARGED),
(VERY-pHoBrC), M0, MI,M2, M3, and the random
constants frbigg"r-rulr.
Terminal set for the
iteration-terminating
branch rrBl:
(PHOBIC), (PHILIC), (NEUTRAL), (CHARGED),
(VERY-PHoBrC ) , J0, JL, J2, J3, and the random
constants Sbigg".-..ulr.
Function set for the
iteration-terminatin g
branch rrBo:
LOOK, IFLTE, *,-,*, %,andORN.
Function set for the
iteration-performing
t 1
DTANCN I'I'TJI:
)ame as r']'uu.
Fitness cases: Set of 22,981, in-sample residues from47 mouse
transmembrane proteins and 17,158 out-of-sample
residues from 38 mouse transmembrane proteins.
Raw fitness: Correlation C (ranging from -1.0 to +1.0).
Standardized fibress: Standardized fibress is
T_C
2
Hits: 100 times the difference of 1.0 minus standardized
fitness for the out-of-sample set.
Wrapper: Labels each individual residue as being in a transmembrance domain or non-transmembrane area.
Parameters: M=4,000.G=27.
Success predicate Aprogram scores an out-of-sample correlation of 1.00.
Chapter 20
The first iteration-terminating branch, rrB o, uses ( LooK (vERy- pHoBrc ) )
to look ahead to see if the next residue is very hydrophobic. It also uses ( vERyPHOB r c ) to determine if the current residue is very hydrophobic. Thus, r rB 0
can evaluate to 1,0, ot +2 and can be positive only when both the current
residue and the next residue are very hydrophobic. Thus, two consecutive
very hydrophobic residues will terminate execution of the first iteration-performing branctu rPBO. Even though transmembrane domains do not necessarily begin with two consecutive very hydrophobic residues and
non-transmembrane areas can sometimes contain two consecutive very
hydrophobic residues, two consecutive very hydrophobic residues does better than random in predicting the onset of a transmembrane domain.
The second iteration-performing branch, rpB1, of the best of generation 0
sets .T0 and ;t to -L or +L according to whether the current residue is electrically charged; howeve4 no use is ever made of ;O or J1.
The second iteration-terminating branckr, rrB1, can be simplified to the
product of (pHoerc) and (NEUTRAL) for the current resid,ue. Since
(PHoBrc) and (NEUTRAL) cannotsimultaneouslybe +1., rrBl canbepositive only if (pnoerC) and (NEUTRAL) are both -j.. This occurs when the
current residue is neither hydrophobic nor neutral (i.e., when the current residue is hydrophilic). Thus, the putative transmembrane domain predicted by
rrB0 is ended by the occurrence of the first hydrophilic residue.
Figure 20.3 shows the 446 residues of D3DR_MOUSE. Thrs M5-residue
protein has seven transmembrane domains (boxed in the figure) at positions
33-55, 67-92, 105-126, r50-L72, rBG-209, 375-999, md 4r3 34. As it happens/ D3DR-MOUSE is not included in the set of fifiress cases because its
length is greater than 200.
Figure 20.4is an analysis of the behavior of the best of generation 0 of run 1
on the 446 rcsidues of D3DR-MOUSE. The seven transmembrane domains
are boxed in the figure. The residues responsive to the (VERv-eHOBrC )
detector are underlined in boldface in this particular figure. The solid areas
denote the correctly classified residues. Specifically, the solid black areas
denote true positives and the solid gtay areas denote true negatives. The
hatched areas denote incorrectly classified residues. Northeasterly hatching
denotes false negatives and northwesterly hatching denotes false positives.
The best of generation 0 of run L misclassifies 81 of the M6 residues of
D3DR-MOUSE. This program starts by classifying residues 1-38 as a nontransmembrane area. The first transmembrane domain acfually starts at residue 33, so this program misses the actual starting point by six residues and
misclassifies residues 33*38. The first 32 residues are shown in solid gray
since they are true negatives and residues 33-38 are given a northeasterly
hatching since they are false negatives. Residues 39 and 40 (L and l) are both
very hydrophobic (indicated by the underlined boldface type), so TrB0 is
satisfied and the residues starting at position 39 are correctly classified as
being in a transmernbrane domain. Residue 47 is hydrophilic (i.e., not neutral
and not hydrophobic), so rrBl is satisfied and residue 48 is misclassified as
)r) Lookahead Version of the tansmembrane problem
MAPLSQISSH INSTCGAENS TGVI\RARPHA Y@NCIV
dffiilneRAL eTTTN TGGWNFSR
TCC DRYT AWMPVHYQH GTGQSSCRRV
NrrGDPSr csrsffi
FOVTffitrN IYYtr/LRQRRR KRILTRQNSQ CISIRPGFPQ QSSCLRLHPI
RQFSIRARFL SDATGQMEHI EDKPYPQKCQ DPLLSHLQPL SPGQTHGELK
RYYSICQDTA LRHPNFEGGG GMSQVERTRN SLSPTMAPKL SLEVRKLSNG
RLSTSLKLGP LQPRGVPLRE T<T<AI tH
CQACHVSPNT VN EFRKAF LKILSC
Figure 20.3 The 446 residues of D3DR-MOUSE.
MAPIJSQISSH INSTCGAENS TqINRARPHA Y@
rcc Rvr RtraIMPyHYQH crCQSSCnfi
NTTGDPfl, CSIS@
5 0
100
1s0
200
250
300
350
400
446
5 0
100
150
200
Fcrtffiiln TyMIIIIIRQRRR KRILTRQNSQ crsrRPGF PQ QSSq.RIJHPI 250
RQFSIRARF! SDATGQMEqI EDKPYPQKCQ DELLSHIQP! SPGQTHGqLK 300
J5U
400
CQACHYSPEI. Y EERKAF IIKIIISC 446
Figure 20.4 Analysis of the behavior of the best of generation 0 of run 1 for the lookahead
version of the transmembrane problem.
RYYSICQDTA I.RHPNFEGGG GMSqTERTRN SLS PTMAPI{IJ SIJEYRKI'SNG
RLSTSI.KI.GP I4QPRGYPI'RE TXE
51.6 Chapter 20
517
non-transmembrane (and is given northeasterly hatching showing that it is a
false negative). Residues 3947 are shown in solid black since they are true
positives. Control then toggles back to the first pair of branches, rpBo and
rrB0. The misclassification is immediately corrected because of the fortuitous occuffence in this particular protein of fwo consecutive very hydrophobic residues at positions 48 and 49 thereby satisfying TTB0. The residues
starting at position 49 arc correctly classified as being in a transmembrane
domain. Residues 49 and50 (and beyond) are therefore shown in solid black
since they are true positives.
Residue 57 is hydrophilic, so rrBl is satisfied and the residues starting at
position 57 are correctly classified as non-transmembrane (and are solid gray
denoting true negatives). Howeve4, the first transmembrane domain acfually
ends at residue 55, so this program misses the actual ending and misclassifies
residue 56 (given a northwesterly hatching showing that it is a false positive).
A total of eight errors have been made up to this point in this protein. The
second transmembrane domain actually starts at residue 67. Because this particular protein happens to have two consecutive very hydrophobic residues
at positions 67 and 68, this program correctly identifies the exact beginning of
this transmembrane domain.
The presence of two consecutive very hydrophobic residues is not a very
good indicator of the beginning of a transmembrane domain. Howeveq, it is
sometimes correct (as at the beginning of the second transmembrane domain
at residue 67). Moreovet since a transmembrane domain does contain aprcponderance of hydrophobic residues, it often harbors two consecutive very
hydrophobic residues. Thus, the correct beginning of the first transmembrane
domain at residue 33 is missed by this imperfect trigger; however, the
domain is belatedly detected when the L and I appear at positions 39 and 40.
The above best random program is somewhat better than certain similar
programs present in generation 0. For example, when the first iterationterminating branch, rrB0, uses ( pHoBrc ) instead of (vnnv-pHoBrc ), the
resulting hypothetical program makes 93 errors, instead of 81 errors.
The best-of-generation predicting program from generation 5 of run L has
an in-sample correlation of 0.48 as a result of getting 4,004true positiv es, 15 ,255
true negatives, 2416 false positives, and L,306 false negatives over the 22,981
in-sample fitness cases. When tested on the out-of-sample set, this program
has an out-of-sample correlation of 0.63 as a result of getting 3,500 true positives, 17,135 true negatives, L,48'/-. false positives, and 1,,042 false negatives
over the 17,\58 out-of-sample fitness cases. It scores 81 hits.
( loop-unt i I -end-o | -protein
( looping-over -res i dues
(z (oRN (? (VERY-PHOBrC) (pHrLrC) ) (SETM1 (pHrLrC) ))
(LOOK (? (NEUTRAL) (PHOBIC))) )
(LOOK (IFLTtr (ORN M], (PHILIC) )
(IFLTE Ml (PHOBTC) M2 (PHOBICI I
Lookahead Version of the Transmembrane Problem
(LOOK (* (LOOK (TFLTE (PHOBTC) M1 M3 (NEUTRAL) ))
(* (TFLTE (pHOBrC) (NEUTRAL) M3 (pHrLrC) )
(oRN (PHOBTC) 2.632) ) ) )
(% M0 (CITARGED) )))
t' .1
\ !vvyrrr:, nnn i n.r-cr\/er- res idues
(+ (SETJ2 (PHOBIC) ) (+ J2 J0))
(LOOK (CHARGED) )).
Figure 20.5 is an analysis of the behavior of the best of generation 6 of run L
on D3DR_MOUSE. The underlined boldfaced type is used in this particular
figure to denote the residues responsive to the (CHARGED ) detector in this
figure.
The best of generation 6 misclassifies 67 of the 446 residues of the
D3DR-MOUSE protein.
The first iteration-terminating branch, rTBO, of the best of generation 5 of
run L is not very good at detecting the beginnings of transmembrane
domains of this protein. It belatedly identifies the begiruring of five of the
seven actual transmembrane domains (three domains by two residues and
two domains by six residues). It correct$ identifies the exact beginning of
one of the seven actual transmembrane domains. It totally misses the seventh
transmembrane domain at 4t3134 (this large false negative area being indicated by the large northeasterly hatched area at the bottom of figure 20.5).
The (LOOK (CHARGED) ) of rTBl of thebestof generation6doesareasonably good job of detecting the ends of the seven actual transmembrane
domains of this protein. rrBl correct$ identifies the exact ending of three of
the seven acfual transmembrane domains; it prematurely terminates one transmembrane domainby three residues; itbelatedly terminates two transmembrane domains by four and nine residues. It also incorrectly terminates the
second transmembrane domain at residue 75.
Figure 20.6 shows the fitness curves for run 1 of the lookahead version of
the transmembrane problem.
The best-of-generation predicting program from generation L9 of run L has
an in-sample correlation of 0.68 as a result of getting 4,I21true positives,16,162
true negatives, 1,509 false positives, &nd 1,189 false negatives over the22,981,
in-sample fitness cases. \A/hen tested on the out-of-sample set, this program
has an out-of-sample correlation of 0.6988 as a result of getting 3,549 true
positives,11,,593 true negatives, 1,023 false positives, and 993 false negatives
over the I7,L58 out-of-sample fitness cases. This corresponds to an out-ofsample error rate of tl.7%.It scores 84 hits.
( loop-unt. i I - end-of -protein
/ I anni nn-nrrn- -^n i r{"ac
\ ruuI-/rrrV vvCI -I gbILlL,rg-
(z (oRN (eo (vERy-pHoBrc) (pHrl,rc)) (z (oRN (z (VERYPHOBIC) (PHILIC) ) (SETM1 (PHILIC) ) ) (LOOK (Z 6.636
M3)))) (LOOK (Z (LOOK (Z (Z (ORN (Z (VERY-PHOBIC)
(pHrlrc) ) (sErM1 (pHrlrc) ) ) (LooK (% 6.635 M3) ) )
(pHoBrc))) (LooK (eo 6.636 M3)))))
518 Chapter 20
MAPLSQISSH INSTCGE,NS TGV}ts,ABPHA
QTTT LWS LAVIDLLVAT LVMPWWYIE V: W,INFSB
fdFmF?An ryuvrgeggB trBrl,rBeNse crsrBpcFpe esscrg,Lgpr
ELSTSLtrLGP LQPBGVPLELKKA TT
CQACHVSPEI, t FEKAF LKILSC
5 0
100
t-50
200
250
300
350
400
446
ct) U)
c)
hl
ta
-
rq)
N
L
€
U)
Figure 20.5 Analysis of the behavior of the best of generation 6 of run 1 for the lookahead
version of the transmembrane problem.
01020
Generation
Figure 20.6 Fitness curyes for run 1 of the lookahead version of the transmembrane problem.
579
YT AWMPVHYQH GTGQSSG,EE
TTQPSI CSIS@
SQFSIBABFL SDATGQI'{EHI TDKPYPQKCQ DPLLSHLQPL SPGOTHq,Ltr
BYYSICQDTA LBHPNFEGGG GMSQVEBTRN SLSPTMAPKL SLE\TSKLSNG
Worst of Generation
Average
*e- Best of Generation
Lookahead Version of the Transmembrane problem
(LOOK (IFLTE (ORN M1 (PHILIC) ) (TFLTE Ml (PHOBIC) M 2
(PHOBTC ) ) (LOOK (TFLTE (ORN M1 ( PHTLTC ) ) (?t (% M 0
(CHARGED) ) -5.229) (LOOK (IFLTE (ORN Ml (PHILIC))
(TFLTE (ORN M1 (PHrLrC) ) (',t', (LOOK (TFLTE (PHOBTC) U r
M3 (NEUTRAL) ) ) (* (z M0 (CHARGED) ) -5.229) ) (LOOK ( *
(LOOK (IFLTE (PHOBIC) Ml M3 (NEUTRAL) )) ('( (IFLTE
(PHOBIC) (NEUTRAL) M3 (PHILIC)) (* (NEUTRAL) -
5.229)))) (* (rFLrE (PHOBTC) (NEUTRAL) M3 (PHrLrC))
(* (NEUTRAL) -5.229))) (LOOK (LOOK (rFl,TE (ORNM1
(PHILIC)) (IFLTE (PHOBTC) M1 M3 (NEUTRAL)) (LOOK ( *
(LOOK (IFLTE (PHOBIC) Ml M3 (NEUTRAL) ) ) (* (IFLTE
(PHOBIC) (NEUTRAL) M3 (PHILIC) ) (ORN (PHOBIC)
2.632)))) (Z M0 (CHARGED) )))) (% M0 (CHARGED) ))) ( z
M3 (CHARGED) ))) (% M0 (CHARGED) )))
/ I nnni n.r-o\ror-residues \ !vvyrrrY
(ORN (SETJ2 (CHARGED) )
(+ (Onu (NEUTRAL) .11) (* J0 (CHARGED) )) )
(LOOK (CHARGED) )).
Figure 20.7 isan analysis of the behavior of the best of generation 1.9 of run
1 on D3DR-MOUSE.
The best of generation 19 of run L misclassifies only 30 of the 446 residues
of the D3DR_MOUSE protein. The first iteration-terminating branch, ITB0,
of the best of generation L9 belatedly identifies the beginning of four of the
seven actual transmernbrane domains (by between one and four residues); it
correct$ identifies the exact beginning of one of the seven actual transmembrane domains; it prematurely identifies the beginning of two transmembrane
domains (oy one residue each). Unlike the program from generation 6, this
program does not miss the seventh transmembrane domain. In tum, rrBl
correctly identifies the exact ending of the seventh transmembrane domain.
This program found the correct number of transmembrane domains (but for
the incorrect interruption of the third transmembrane domain at residues
110-111).
Figure 20.8 compares,by generation, the in-sample correlation and out-ofsample correlation for run 1.
The fact that neither the out-of-sample correlation nor the in-sample correlation had peaked by generation 20 in figure 20.8 suggests that running this
problem for additional generations mightbe productive.
Figure 20.9 shows the fitness curves for the third best run (called run
2 here) of this problem.
In run 2 of this problem, the best-of-generation predicting Program from
generation 20 has an in-sample correlation of 0.6343 as a result of getting
4,226 true positives, 15,516 true negatives, 2,155 false positives, and 1,084
false negatives over the22,981 in-sample fitness cases. When tested on the
out-of-sample set, this program has an out-of-sample correlation of 0.6638
as a result of getting3,655 true positives,1L,150 true negatives, 1',466 false
positives, and 887 false negatives over the 17,158 out-of-sample fitness
520 Chapter 20
MAPLSQISSH INSTCGAENS TGVNRARPHa vffiv 5 0
100
150
ffiilnnnAl errrN *
NTTGDPST csrsffi 2oo
ffiiln IYMVLRQRRR KRILTRQNSQ CISIRPGFPQ QSSCLRLHPI 250
RQFSIRARPL SDATGQMEHI EDKPYPQKCQ DPLLSHLQPL SPGQTHGELK 3OO
RYYSICQDTA LRHPNFEGGG GMSQVERTRN SLSPTMAPKL SLEVRKLSNG 350
RLSTSLKLGP LQPRGVPLRE TTE TTT 4OO
\-vaLnvDHlrrr T EFRKAF LKILSC 446
Figarc 20.7 Analysis of the behavior of the best of generation 19 of run 1 for the lookahead
version of the transmembrane problem.
In Sample
-.L Out of Sample
01020
Generation
Figure 20.8 Comparison of values of in-sample and out-of-sample correlation for the 1rn 1 of
the lookahead version of the transmembrane problem.
I
q)
Lr
O
rcc DRyr AwMpVHyeH GTGQSSCE
CQACHVSPEL Y EFRKAF LKILSC
521 Lookahead Version of the Transmembrane problem
Worst of Generation
-'- Average
* Best of Generation
(A
a
q)
I
-t
u 0.4
N
-
Lr
a
0.0
522
o
c.r,.1rXtio,,
20
Figure 20.9 Fibress curves for run 2 of the lookahead version of the transmembrane problem.
cases. This corresponds to an out-of-sample error rate of 1'3.7%.It scores
83 hits.
/'l nnn-rrnf i I -onrl-nf -nrnJ- oi n
\ rvvy urru!!
( looping- over- res idues
(SETI2 12\
(IFLTE (* I1- (VERY-PHOBIC)) (LOOK (PHOBIC)) (+ (LooK (LOOK
(+ (LooK (LooK (LooK (PHOBIC)))) (+ (LooK (LooK (PHoBrC) )) (+ (LooK (LooK (LooK (LooK (+ (LooK (PHOBIC) )
(TFLTE (PHILIC) (PHOBIC) (VERY-PHOBIC) (NEUTRAL) ))))))
(oRN (LOOK (ORN (ORN I1- (PHOBIC) ) (PHOBIC) )) (+ I0 (VERYPHOBIC) ))))))) (OnN (+ (VERY-PHOBIC) 0.51-7) (* T2 12)))
(LOOK 12) ) )
(
\
I rvvl/f noni ncr-cr\/era-residues rrY
(* (LOOK J2) (- (pHr],rc) (VERY-PHOBTC) ) )
(oRN J0 (LOOK (CHARGED) ) ))) .
Figure 20.10 is an analysis of the behavior of the best of generation 20 of run
2 on D3DR-MOUSE.
Figure z}.Llcompares, by generation, the in-sample correlation and outof-sample correlation for run 2.
Run 2 was actually extended to generation 33 where in-sample correlation
improved to 0.6652and out-of-sample correlation improved to 0.6881. However, this value is not as good as the value obtained on generation L9 of run L.
Nonetheless, the operation of the program here is very similar to the best of
run L. Notice, for example, the same error at residues 110-111 in the third
transmembrane domain.
Thble 20.2 shows the best values of out-of-sample correlation for five runs
of the lookahead version of the transmembrane Problem.
Weiss, Cohen, and lndurkhya (1993) wrote a Program that did a fulIparse
(rather than the partial parsing with lookahead used here) and achieved a
superior error rate of about 8%. Thus, the genetically evolved partial Parser
did almost as well as a full Parser writtenby a human researcher'
Chapter 20
MAPLSQISSH INSTCGAENS TGVNRARPHA YEALSYCALI LAIIFGNGLV
rcC DRyr AwMpvHyOH crcOsscfi
NTrcDpsr csrsffi
5 0
100
150
tvu
ffifvr,v?flR ryl,rvLReRRR KRTLTReNse crsrRpcFpe esscr,Rr,Hpr 250
RQFSIRARFL SDATGQMEHI EDKPYPQKCQ DPLLSIII,QPL SPGQTHGELK 3OO
ilii
RYYSICQDTA LRHPNFEGGG GMSQVERTRN SLSPTMAPKL SLEVRKLSNG 350
#$${$$$ffi IiIiiJ[$IifI$f i$fi iii*if, [lEilx*frf
RLSTSLKLGP LQPRGVPLRE XXA TTT 4OO
446
Figure 20.10 Analysis of the behavior of the best of generation 20 of run 2 for the lookahead
version of the transmembrane problem.
In Sample
{- Out of Sample
0 10.. 2 0 Generation
Figure 20.11 Comparison of values of in-sample and out-of-sample correlation for run 2 of the
lookahead version of the transmembrane problem.
?t 0'6
tl
9i
L
'Q o.s
\J
0.4
ffiSilnnnAL errrN R
Lookahead Version of the Transmembrane problem
Thble 20.2 Best values of out-of-sample correlation for five mns of the lookahead
version of the transmembrane problem.
Run Generation Out-of-sample Error
correlation
1,
2
3
4
5
19
20
20
17
20
0.6988
0.68M
0.6638
0.6556
0.6541
11.7o/.
12.3%
13.7%
13.2%
135%
Our goal in this chapter and chapters L8 and L9 was to illustrate how
genetic programming can be used to evolve complicated multi-branch
programs capable of solving problems from the real world. Our goal was not
actually to produce thebest possible solution for any particular problem. Performance can be improved for many problems by using various post-processing techniques. The outputs produced by neural networks for practical
problems are frequently wrapped in a filter that applies various simple rules
to eliminate manifest errors. These post-processing techniques are, of course,
equally applicable to the programs evolved by genetic programming. For
example, whenever one or two isolated residues in a protein are classified
differently from numerous neighboring residues, then the isolated residues
canbe reclassified to conform to their neighborhood. The second transmembrane domain of D3DR-MOUSE runs from positions 67-92; howeve{, there
are two isolated false negatives at positionsT7-78 in figure 20.10. Those two
residues are, in fact, clearly part of the transmembrane domain.
524 Chapter 20
2l Evolutionary Selection of the Architecture
of the Program
So far in this book, whenever we have applied genetic prograrnming with
automatically defined functions to a problem, we first made a group of architectural choices for the yet-to-be-evolved overall programs. We called these
architectural choices the sixthmajor step inpreparingtouse genetic programming. As previously mentioned, the sixth major step involves determi.irg
(a) the nurnber of function-defining branches,
(b) the number of arguments possessed by each function-defining branch,
and
(c) if there is more than one function-defining brancll the nature of the hierarchical references (if any) allowed between the function-defining
branches.
Four different ways of making these architectural choices have been previously described (chapter 7):
. prospective analysis of the nature of the problem,
' seemingly sufficient capacify,
. affordable capacity, and
. retrospective analysis of the results of acfual runs.
We saw in sections 7.4 and7.5 that,regardless of which of 15 architectures
were employed, genetic programming with automatically defined functions
was still capable of solving the even-S-parity problem. Moreove4 regardless
of the architectural choice, less computational effort was required with automatically defined functions than without them for that probl"*.
The above facts are comforting in the sense that they offer four ways to
perform the sixth major step and that they suggest that genetic programming can solve some problems in spite of a bad architectural choice (albeit
more slowly).
Butsuppose one is unable or unwilling tomake these architectural choices.
This chapter and chapters 2215 demonstrate a way by which the architecture of the overall program can be evolutionarrly selected in a competitive
fitness-driven process during a run of genetic programming at the same time
as theproblemisbeing solved. Thatis, the solutionto theproblemwill consist
of a yet-to-be-evolved result-producing branch that can call a yet-to-be-determined number of yet-to-be-evolved automatica\ defined functions (each
taking a yet-to-be-chosen number of arguments).
The function set of the result-producing branch of the overall program
that solves the problem will consist of the primitive functions of the problem along with a yet-to-be-determined number of yet-to-be-evolved automatically defined functions. The function set of each function-defining
branch will consist of the primitive functions of the problem along with
whatever other automatically defined functions, if any, that functiondefining branch is entitled to reference hierarchically (in accordance with
our usual convention for numbering the automatically defined functions
from left to right). The terminal set of the result-producing branch will
consist of the actual variables of the problem. The terminal set of each
function-defining branch will consist of the now-chosen number of dummy
variables appropriate for that branch.
Implementation of this evolutionary method of selecting the architecture
of the overall program is accomplished by making the following two changes
from the general method of implementing genetic Programming with
automatically defined functions described in chapter 4 and used heretofore
in this book:
. The initial random population is created so as to be architecturally diverse
(rather than architecturally uniform).
. The way of assigning types to noninvariant points of an overall program is
point Vpi.g (rather thanbranch typit g). This has a concomitant impact on
the way of performing crossover.
hr this chapter and chaptersl}1l, the architecture of the eventual solution
to the problem is not preordained by the user during the preparatory steps.
brstead, it emerges from a competitive fibress-driven process that occurs during the run at the same time as the problem is being solved. The architecture
of each offspring produced by crossover willbe the architecture of the receiving parent of the pair of parents. The evolutionary fitness-driven process will
cause certain suitably configured individuals to prosper in the population;
at the same time, individuals in the population with architectures that are
less suitable for the problem environment will tend to wither away in the
population.
When the initial random population is created in this evolutionary
method, generation 0 contains programs with different architectures. That
is, the number of automatically defined functions and the number of av
guments that they each possess can differ from Program to program within
ihu popnlation. The different architectures range over various potentially
useful architectures. Each program is evaluated for fitness and selected to
participate in the genetic operations, such as crossover, on the basis of its
fitness in the usual way.
Because the population is architecturally diverse, the parents selected
to participate in the crossover operation will usually Possess different
Chapter 2L
numbers of automatically defined functions. Moreov e\ artautomatically
defined function with a certain name (e.g., ADF2) belonging to one parent
will often Possess a different number of arguments than the same-named
automatically defined function belonging to the other parent (if indeed
ADF2 is present at all).
If, hypothetically, branch-typing were to be used on an architecturally
diverse populatiory the crossover operation would be virtually hamstrung;
hardly any crossovers could occur. Point typir,g (described below in section
21,.3) cures this potential difficulty.
Structure-preserving crossover with point Vpirg permits robust recombination while simultaneously guaranteeing that any pair of architecturally
different parents will produce slmtactically and semantically valid offspring.
When genetic material is inserted into the receiving parent during structurepreserving crossover with point Vpirg, the offspring inherits its architecture
from the receiving parent and is guaranteed to be syrtactically and semantically valid.
2I.I CREATION OF THE INITIAL RANDOM POPULAIION
\rVhen we are using the evolutionary method of determi.i.g the architecture
while solving a problem, the creation of an individual program in the initial
random population begins with a random choice of the number of automatically defined functions,if any, that will belong to the program. Then a series
of independent random choices is made for the number of arguments possessed by each automatically defined function, rt any, in the program. Alt of
these random choices are made within a wide (but limited) range that
includes every number that might reasonablybe thought to be useful for the
problem at hand. Zero is included in the range of choices for the number of
automatically defined functions, so the initial random population also includes
some programs without any automatically defined functions.
Once the number of automatically defined functions is chosen for a particular overall program, the automatically defined functions, if arty, are systematically named in the usual sequential manner from left to right. For
example, if a particular newly created program has three automatically
defined functions, th"y are narned anro, ADFI-, and anr'2.
We first use the Boolean even-S-parity problem to illustrate the evolution
of architecture.
The range of possibly useful numbers of arguments for the automatically
defined functions canrtot, in general, be predicted with certainty for an arbitrary problem. One can conceive of hypothetical problems involving only a
few actual variables for which it might be useful to have an automatically
defined function that takes a much larger number of arguments. Howeveq,
our focus is primarily on solving problems by decomposing them into problems of lower dimensionality. Accordingly, it is reasonable to cap the range of
the number of probably-useful arguments for each automatically defined function by the number of actual variables of the problem. There is no guarantee
Evolutionary Selection of the Architecture of the program
that this cap (motivated by our desire to decompose problems) is necessarily
optimaf desirable, orsufficientto solve a givenproblem. Conceivably, awider
range may be necessary for a particular problem. (If this were the case, there
is no reason not to use a wider range).
h,, *y event, practical considerations conceming computer resources play
an important role in setting the upper bound on the number of arguments to
be permitted. In the case of the even-S-parity problem, there are five actual
variables for the problem, DO, Dl-, D2,D3, and O4. \Mhen we apply the above
cap, the range of the number of arguments for each automatically defined
function for the even-S-parity problem is from zero to five.
The range of potentially useful numbers of automatically defined functions cannot, in general,be predicted with certainty for an arbitraryproblem.
The number of automatically defined functions does not necessarily bear any
relationto the dimensionalityof theproblem. However, once agarn, considerations of computer resources againplaya controlling role in settingthe upper
bound on the number of automatically defined functions to be permitted. A
range of between zero and five automatically defined functions provides seemi.gly sufficient capacity to solve the even-S-parity problem.
Of course, if no consideration need be given to computer resources/ one
might permit automatically defined functions with more arguments than there
are actual variables for the problem and one might permit still more automatically defined functions than just specified.
In practice, azero-argument automatically defined function may not be a
meaningful option. If an automatically defined function has no access to the
actual variables of the problem (consistent with the convention used throughout this book), has no dummy variables, does not contaitt any side-effecting
primitive functions, and does not contain any random constants, nothing is
available to serve as terminals (leaves) of the program tree in the body of such
a zero-argument automatically defined function. In the floating-point, intege1, and certain other domains, it maybe useful to include random constants
because a zero-argLlment automatically defined function can be used to create an evolvable constant that can then be repeatedly called from elsewhere
in the overall program (i.e., a 1et, as described in section 5.4). Howeveq, for
the special case of the Boolean domain, the two possible Boolean constants (l
or NIL) have limited usefulness because all compositions of these two constants merely evaluate to one of these two values. Consequently, we have
started the range of the number of arguments for each automatically defined
function for a Boolean problems at one, instead of zeto.
If we adoptthe ranges described above fortheeven-5-parityproblem, there
are six possibilities for the number of automatically defined functions and
five possibilities for the number of arguments for each automatically
defined function. When there are no automatically defined functions, there
is only one possible argument map for the automatically defined functions, namely the map {}. When there is exactly one automatically defined
function in the overall Program, there are five possible argument maps
528 Chapter 21
for that automatically defined function, namely {rl, {zL {gl, I4l, and {5}.
When there are exactly two automatically defined functions, there are 25
possible argument maps for the automatically defined functions. In all,
there are 3,906 possible argument maps for programs subject to the constraints described above.
A population size of 4,000 is used throughout this chapter. Given this choice,
not all of the 3,906 possible argument maps for the ADFs will, as a practical
matteq, be represented in the initial random generation. If the population were
significantly larger than 4,000, virtually all of the 3,906 possible argument
maPS for theADFs would likelybe represented in generation 0. In a population of 4,000, there are about 666trtttral random programs with each of the six
possible numbers of automatically defined functions (between zero and five).
Approximatelya fifth of the automatically defined functions have each of the
five possible numbers of arguments @etween one and five).
The terminal set for the result-producing branch, ,Trpb, for a program in the
population for the Boolean even-S-parlty problem is
Trpb = {D0, D1, D2,D3,D4} .
The terminal set for each automatically defined function is derived from
the argument map of the overall program.
For example, if the argument map is {3, 5}, there are two automatically
defined functions in the overall program. The terminal set for ADFO is
,lad.fl - {ARGO, ARGI_, ARG2 },
and the terminal set for ADF1 is
,Iadfl - {ARGO, ARG1, ARG2, ARG3, ARG4 }.
The function set for the result-producing branch, frpb,is the union of {aivl,
oR, NAND, NoRl and whatever automatically defined functions are present in
that particular Program. Thus, when there are no automatically defined
functions,
-
frpb= {attn, oR, NAND, NoR}
with an argument map for this function set of
{2,2,2,2}.
However, when there are five automatically defined functions, the function
set for the result-producing branch is
frpb= {aorO, ADF1, ADF2, ADF3, ADF4, AND, OR, NAND, NOR}
with an argument map for this function set of
{h, kf h, h, k4,2,2,2,21,
where ko,k'1.k2,k3, andk4are the number of arguments possessed by aDro,
ADF1, ADF2, ADF3, and anr'4, respectively, in that particular individual.
The function set, fad.fl, for aoFo (if present in a particular program in the
population) consists merely of the primitive functions of the prout"*.
Evolutionary Selection of the Architecture of the program
Thble 21.1 Thbleau without ADFs for the even-S-parity problem with evolution of
architecture.
fadfl- {AND, oR, NAND, NoR)
with an argurnent map for this function set of
{2,2,2,21.
Each automatically defined function can refer hierarchically to any alreadydefined function-defining branches belonging to the program. For example,
the function set, fadfl, for ADF1 (if present) is
fadfl - {ADF0, AND, oR, NAND, NoR}
with an argument map for this function set of
{h,2,2,2,2},
where kg is the number of arguments possessed by ADF0.
Similarly, the function set for each successive automatically defined function (if present) is the union of the function set of the previous automatically
defined function and the name of the previous automatically defined function. For example, the function set, fadf4, for anr4 (if present) is
fadfa- {aorO, ADFI, ADF2, ADF3, AND, oR, NAND, NoR}
with an argument map for this function set of
UcO, kt, h, k3,2,2,2,21,
Chapter 21
Objective: Find a program that produces the value of the Boolean
even-S-parity function as its ouput when given the
values of the five independent Boolean variables as
input.
Terminal set
without ADFs:
DO, D1, D2,D3, and n+.
Function set
withoutADFs:
AND, O& NAND, and NOR.
Fitness cases: All25 = 32 combinations of the five Boolean arguments
D0, Dl-, D2,D3, and n4.
Raw fihress: The number of fitness cases for which the value
retumed by the program equals the correct value of the
even-S-parity function.
Standardized fitness: The standardized fitness of a program is the sum/ over
the 25 = 32 fitness cases, of the Hamming distance
(error) between the value returned by the program and
the correct value of the Boolean even-S-parity function.
Hits: Same as raw fitness.
Wrapper: None.
Parameters: M=4,000.G=51.
Success predicate: A program scores the maximum number of hits.
530
Table 2'l'.2 Thbleau with ADFs for the even-S-parity problem with evolution of
architecfure.
where kO,k'1.,k2, andk3, arc the number of arguments possessed by ADFO,
ADF1, ADF2, and atp3, respectively.
The random method of creating the initial population determines whether
the body of any particular function-defining branch of any particular program in the population actually calls all, none, or some of the automatically
defined functions which it is theoretically permitted to call hierarchically.
Subsequent crossovers may change the body of a particular functiondefining branch and thereby change the set of other automatically defined
functions that the branch actually calls hierarchically. Thus, the functiondefining branches have the ability to organize themselves into arbitrary
disjoint hierarchies of dependencies among the automatically defined functions. For example, within an overall program with five automatically
defined functions at generation 0, ADF4 might actually refer only to ADF2
and apr'3, with ADF2 and anp3 not referring at all to either ADFO or ADFI;
and aopl might refer only to ADFO. In this situation, there would be two
disjoint hierarchies of dependencies. A subsequent crossover might change
this organization so that ADF3 might then refer to ADFO (but still not
to alnt). After this crossover, a different hierarchy of dependencies would
exist. Ary allowable (i.e., noncircular) hierarchy of dependencies may
Objective: Find a program that produces the value of the Boolean
even-S-parity function as its ouput when given the
values of the five independent Boolean variables as
input.
Architecture of the
overall program
with ADFs:
One result-producing branch and between zero and
five aDFs, each taking between one and five
arguments. Each ADF may hierarchically refer to a
lower numbered aor (if my).
Parameters: Point Tping (section 21.2).
Terminal set for
the result-producing
branch:
D0, D1-, D2,D3, and O+.
Function set for
the result-producing
branch:
AND, OR, NAND, NOR, and whatever ADFS are
present (if any) in the program.
Terminal set for the
function-defining
branch of each
aor (if any):
Each anr takes a variable number of dummy variables
between one and five.
Function set for the
function-defining
branch of each
aon (if any):
AND, OR, NAND, NOR, and each lower numbered
ann (if my).
s31 Evolutionary Selection of the Architecture of the Program
thus be created in generation 0 or created by crossover during the evolutionary process.
Tables 21.L and 21.2 summ atrze the key features of the even-S-parity problem with evolution of architecture. The tableau withoutADFs provides general information about the problem and applies to those individuals in the
population that happen to have no automatically defined functions. The
tableau with ADFs applies to multi-branch individuals.
21.2 Point typing for Structure-Preserving Crossover
The basic idea of structure-preserving crossover is that any noninvariant point
anywhere in the overall program is randomly chosen, without restriction, as
the crossover point of the first parenf however, once the crossover point of
the first parent has been chosen, the crossover point of the second parent is
randomly chosen from among points of the same type. The typing of the
noninvariant points of an overall program is done so that the structure-preserving crossover operation will always produce valid offspring.
Point typing is used when the architecture of the overall program is
being evolutionarily selected while the problem is being solved. The crossover point of the first (contributing) parent is chosen, without restriction,
in the usual manner for structure-preserving crossover. The types produced bybranch typir'tg are insufficiently descriptive and overly constraining in an architecturally diverse population. Note that after a crossover is
performed, each call to an automatically defined function actually appearing in the crossover fragment from the contributing parent will no longer
refer to the automatically defined function of the contributing parent, but
instead will refer to the same-named automatically defined function of
the receiving parent. Consequently, the restriction on the choice for the
crossover point of the receiving (second) parent must be different for point
typing. The crossover point of the receiving (second) parent (called the
point of insertion) must be chosen from the set of points such that the crossover fragment from the contributing (first) parent "has meaning" if the
crossover fragment from the contributing parent were to be inserted at
the chosen point of insertion.
Point typing is governed by three general principles.
First, every terminal and function actually appearing in the crossover fragment from the contributing parent must be in the terminal set or function set
of the branch of the receiving parent containing the point of insertion. This
first general principle applies to actual variables of the problem, dummy
variables, random constants, primitive functions, and automatically defined
functions.
Second, the number of arguments of every function actually appearing in
the crossover fragment from the contributing parent must equal the number
of arguments specified for the same-named function in the argument map of
the branch of the receiving parent containing the insertion point. This second
general principle goveming point Vpi.g applies to all functions. Howeveq,
532 Chapter 2L
the emphasis is on the automatically defined functions because the same function name is used to represent entirely different functions with differing rurnber of arguments for different individuals in the population.
Third, all other syntactic rules of construction of the problem must be
satisfied.
For clarity and ease of implementatiory the three general principles above
goveming point $pi^S can be restated as the following seven conditions. A
crossover fragment from a contributing parent is said to haae meaning at a
chosen crossover point of the receiving parent if the following seven conditions are satisfied:
(1) All the actual variables of the problem, rt arry, actually appearing in the
crossover fragment from the contributing parent must be in the terminal
set of the branch of the receiving parent containing the point of insertion.
(2) All the dummy variables, if any, actually appearing in the crossover
fragment from the conhibuting parent must be in the terminal set of the
branch of the receiving parent containing the point of insertion.
(3) AIl the automatically defined functions,rf arty, actually appearing in the
crossover fragment from the confributing parent mustbe in the function
set of the branch of the receiving parent containing the point of insertion.
(4) All the automatically defined functions,if er;ry, actually appearing in the
crossover fragment from the contributing parent must have exactly the
number of arguments specified for that automatically defined function
for the branch of the receiving parent containing the point of insertion.
(5) All functions (other than automatically defined functions) must also satisfy
conditions (3) and (a).
(6) All terminals (other than dummy variables and actual variables of the
problem already mentioned in conditions (1) and (2), if any, actually
appearing in the crossover fragment from the contributing parent must
be in the terminal set of the branch of the receiving parent containing the
point of insertion.
(n Al1 other syntactic rules of construction of the problem mustbe satisfied.
We now comment on these seven conditions.
The firstconditionwill usuallybe satisfied if bothcrossoverpoints are from
the result-producing branch of their respective parents. For the even-S-parity
problem, the acfual variables of the problem, D0, D1, D2,D3, and D4, appear
in the result-producing branch. Moreove4, they appear only in the result-producing branch in accordance with the convention used throughout this book
that the actual variables of the problem do not appear in function-defining
branches. To the extent that this convention is observed, this first condition
applies only to result-producingbranches. However, if the actual variables of
the problem appear in the function-defining branches, this condition must
also be satisfied in such function-defining branches.
The second condition requires that the argument list of the branch of the
receiving parent into which the crossover fragment from the contributing
533 Evolutionary Selection of the Architecture of the Program
534
Parent is being inserted must contain all the dummy variables, if any, contained in the crossover fragment. Since dummy variables may not appear in
the result-producingbranch, this second condition applies onlyto crossovers
between function-defining branches.
There are several implications of the third condition which requires that all
automatically defined functions in the crossover fragment must be in the
function set of the branch in which they are to be inserted. This condition
specifies that it is only permissible to have an automatically defined function
in'a crossover fragment if that automatically defined function has already
been defined at the point of insertion of the receiving parent. In the context of
the even-S-parity problem where each function-definingbranch is allowed to
refer hierarchically to every already-defined (i.e.,lower nurnbered) automatically defined function, this means that the number of every automatically
defined function referenced from within a crossover fragment mustbe lower
than the number of the automatically defined function being defined by the
branch of the receiving parent containing the point of insertion. For example,
a crossover fragment containing a reference to ADF1 maybe inserted into the
branch defining ADF2 of the receiving parent, but may not be inserted into
branches defining ADFO (or elrr) of the receiving parent.
The fourth condition requires that the number of arguments taken by each
automatically defined function in a crossover fragment from the contributing
parent exactly match the number of arguments in the argument list of the
branch of the receiving parent that defines the same-named automatically
defined function. For example, a crossover fragment from a contributing parent containing a four-argument call to ADF 0 cannot be inserted into any branch
of a receiving parent unless the branch of the receiving parent that defines
ADF0 specifies that ADF0 can take exactly four arguments.
The fifth condition merely restates the omnipresent requirement that a nonADF function cannot be imported into a branch unless it is permitted to be
included in thatbranchby the function set of thatbranch. This condition (and
the sixth condition) are presented separately in order to highlight the issues
related to automatically defined functions.
The sixth condition similarly restates the general requirement that any other
type of terminal (e.9., random constants, zero-argLLment primitive functions
being treated as terminals) cannot be imported into a branch unless it is permitted to be included in that branch by the terminal set of that branch.
The seventh condition covers the fact that all of the other syntactic rules of
construction of the problem must always continue to be satisfied. Although
there are no syntactic constraints in the even-S-parity problem beyond those
required to implement the automatically defined functions themselves, other
problems, such as those involving decision trees, have additional syrtactic
constraints.
Figure 21.L shows an illustrative program, called parent A. Parent A has
two function-defining branches and one result-producing branch. The first
function-defining branch of parentA defines a three-argument function (aln 0)
and its second function-defining branch defines a two-argument function
Chapter 21
(ARGOARG1 ARG2) (ARGO ARGI)
3J)
Figure 21.1 ParentAwith an argument map of 13,21for its automatically defined functions.
(anrr). The argument map for the automatically defined functions belonging to this overall program is {3,2}. ADF1 can refer hierarchically to ADF0. If
we were planning to perform structure-preserving crossover using branch
typing, there would onlybe three types of points in the overall program. The
points of the body of the three-argument ADF 0 would be of type 7 a; the points
of the two-argument ADFl would be of typeTb; and the points of the resultproducing branch would be of Vp" 8.
Figure 21.2 shows another illustrative program, called parent B, with an
argument map of {3, 2,21fot its automatically defined functions. Parent B has
three function-defining branches and one result-producing branch. The first
function-defining branch defines a three-argument function (enrO); the second function-defining branch defines a two-argument function (anrr); and
the third function-defining branch defines a two-argument function (annZ).
The usual hierarchy of dependencies applies here in that aDFl can refer hierarchically to ADFO and aor2 can refer hierarchically to both ADFO and aop1.
If we were planning to perform strucfure-preserving crossover using branch
Vpi.g, the points of anrO, ADFI, ADF2, and the result-producing branch
would be of different types.
We now illuskate structure-preserving crossover with point typing using
parents A and B.
Suppose point L01 (labeled NaNo) from ADFO is chosen as the crossover
point from contributing parentA (figure 2l.I). The crossover fragment rooted
at point 101 is (NAND ARGO ARG1 ) . Now suppose we consider the eligibility
of a point such as 207 (labeled NAND) of anr t of parent B (figure 21.2) to be a
point of insertion. The eligibility of point 207 ts determinedby examining the
terminal set, the function set, and the ordered set containing the number of
arguments associated with each function in the function set of the receiving
parent.
The terminal set for anpl- of the receiving parent (parent B) is
tadft - {ARGO, ARG1}
Evolutionary Selection of the Architecture of the Program
Figure 27.2 Parent B has an argument m ap of [3, 2, 2] for its automatically defined functions.
The function set for anpl of the receiving Parent (parent B) is
fadfl - {ADFo, AND, oR, NAND, NoR}
with an argument map for this function set of
{3, 2, 2,, 2, 2} ,
Point 207 is etigible to be a crossover point in parent B to receive the crossover fragment (NAND ARGO ARG1 ) rooted at point 101 of parent Abecause
both ARG 0 and aRcl are in the terminal set, Tad1l, of anr t of receiving parent
B, because NAND is in the function set, fo4y1, of enpt of receiving parent B,
and because the NAND function from contributing parent A takes the same
number of arguments (two) as the same-named function, NAND, takes in
receiving parent B.
Irr fact, all eight points (points 207 rhroughzIa) are eligible to be chosen as
the point of insertion of parent B because the crossover fragment ( NAND ARG 0
ARG1 ) is valid throughout ADF1 of parent B. In addition, all ten points of
ADF2 of parent B (points 215 throughzzQ are also eligible to be chosen as the
point of insertion of parent B because the crossover fragment chosen in parent A is also valid throughout ADF2. All seven pointq of alp 0 of parent B (200
through 206) are also acceptable points of insertion for this crossover ftugment. Thus, when point typit g is being used and the crossover fragment
rooted at point 101 of parent A is chosen, a total of 25 points of parent B are
eligible to receive this crossover fragment.
If point L04 labeled axn from ADFO is chosen as the crossover point of
parent A, no point within ADF1 or ADF2 of parent B is eligible to be chosen.
The reason is that the crossover fragment (AND ARG2 ARGO ) that is rooted
at point 104 contains the dummy variable ARG2. ADF1 and anr'2 of parent B
both take only two dummy variables, ARGO and aRC1. ARG2 is not in the
argument list of either ADF1 or ADF2 of parent B. The terminal set of ADF1
(shown above) does not contain ancZ. The terminal set of ADF2 of parent B
happens to be the same as that of anrt and also does not contain ancZ. The
second condition above would be violated for such a choice. In fact, the only
points of parent B that are eligible as points of insertion are the seven points
(200 through 206) of alr0 of parent B.
536 Chapter 21
If point 1,1,2 labeled aNo from ADF1 is chosen as the crossover point of parent A, then all seven points of aoro of parent B (200 through 206), all eight
points of anr'1 of parent B (points 207 tlwoudh 214), and all ten points of
ADF 2 of parent B (points 215 throu gh 22Q are now eligible to be chosen as the
point of insertion of parent B because the crossover fragment (enn ARG1
ARG0 ) is valid throughout ADFO, ADFI-, and anr'2 in parent B.
Howeve{, if point 108 labeled anro from ADF1 is chosen from parent A as
the crossover point, no point within ADF0 of the second parent may be chosen. The function set for ADFO of parent B, the receiving parent, is
fa4fl - {AND, oR, NAND, NoR}
with ein argument map for this function set of
{2,2,2,21
and this function set does not contain anrO (i.e., recursion is not permitted
here). The third condition above would be violated for such a choice. The
same applies to point 107 labeled non because ADF O appears within the crossover fragment rooted at this point.
Lr contrast, all ten points of anp2 of parent B (points 215 through224) are
eligible since ADF2 is permitted to refer hierarchically to ADFO. The function
set for ADF2 of parent B is
foap - {ADFO, ADF1, AND, oR, NAND, NoR}
with an argument map for this function set of
{3,2,2,2,2,2}.
The second condition is illustrated by considering points 225 thuough2}2
from the result-producing branch of parent B. None of these points is eligible
to be chosen as the point of insertion for any of the above cases because all of
the crossover fragments contain dummy variables which are not in the terminal set of the result-producing branch.
The first condition can be illustrated by considering points 115 through
122 from the result-producing branch of parent A. If any point from the
result-producing branch of parent A is chosen as the crossover point of
parent A, then only points in the result-producing branch of parent B
(points 225 through 232) may be chosen as the point of insertion of the
second parentbecause the actual variables of the problem, D0, D j_, Dz,D3,
and D4, are, for this problem, in the terminal set of only the result-producing branch of the overall program.
Point Vpi"g imposes a directionality on structure-preserving crossover that
does not arise with branch typing. For example, when point lOL from ADF O is
chosen as the crossover point of parent A, a point such as 2r1, of aor.1 of
parent B is eligible to be chosen as the point of insertion of parent B because
the crossover fragment (NAND ARGO ARG1 ) is valid anywhere in atrt of
parent B. Howeve{, crossover is not possible from parent B to parent A, given
the selection of these two points. The subtree (ADFO ARGO ARG]_ ARG1 )
rooted at point 211 of parent B is not eligible for insertion at point 101 of anr O
537 Evolutionary Selection of the Architecture of the program
538
of parent Abecause ADFO is not yet defined at point 101 of parent A. Because
of this asymmetry, the simplest approach to implementing point typing is to
produce only one offspring each time two parents are selected to participate
in crossover. Consequently, if crossover is being performed on 90% of the
population, 3,600 structure-preserving crossover operations with point typing
(each involving the selection of two parents on the basis of their fitness, with
reselection allowed) are required to produce 3,600 offspring for a given generation (rather than the 1,800 structure-preserving crossover operations with
branch typi.S).
Now suppose that the roles of parents A and B are reversed and parent B
(figure 2L.2) becomes the contributing parent while parent A (figure 2I.I)
becomes the receiving parent. There are three different possibilities for the
ten points of the branch defining ADF2 of parent B when they are chosen as
the crossover point.
First, if the point 2l6labeled eort from ADF2 of parent B is chosen as the
crossover point of the contributing parent, none of the eight points (107 through
114) of ADF1 of parent A and none of the seven points (100 through 106) of
ADFO of parent A may be chosen because ADF1 is not allowed in eOpO or
ADF1. The same applies to point 215 (labeled on) because point 216 (labeled
ADFI) appears within the crossover fragment rooted at point 215. After
selecting the two parents to participate in crossover and choosing the crossover point from the contributing parent, an attempt is made to choose a valid
crossover point from the receiving parent. If the set of eligible points in the
second parent proves to be empty, the second parent is discarded and a new
selection is made for the second parent to mate with the first parent. Note that
no crossover ever fails completely due to the constraints imposed by point
Vpit g because there is always at least one eligible point of insertion (e.9.,
when a program mates with itself).
Second, if the point 219 labeled anr'O from ADF2 of parent B is chosen as
the crossover point of the contributing parent, any point of the eight points
(107 through 114) of apr'1 of parent A can now be chosen as the point of
insertion since the entire crossover fragment rooted at point 2L9 is allowed in
ADF1 of the receiving parenf however, it is still true that none of the seven
points (100 through 106) of anr0 of parent A may be chosen because a reference to ADFO is not permitted in anpO.
Third, if points such as 2I7,218,220,22'1,,222,223, or 224 from ADF2 of
parent B are chosen, the constraints imposed by point typing permit any
point of anr O or ADF 1 of parent A to be chosen. The unrelated convention
used throughout this book of never choosing a root as the point of insertion for a crossover fragment consisting only of a single terminal would
have the effect of preventing points 100 and 1"07 of parent A from being
chosen for terminal point s 217 , 2I8, 220 , 22I , 223, or 224 (but they could be
chosen for point 222).
Figure 21.3 shows parentC with an argumentmap of {4,2} for its automatically defined functions. Parent C has two function-defining branches and one
Chapter 21
Figure 21.3 Parent C has an argument map of 14,21for its automatically defined functions.
result-producing branch. The first function-defining branch defines a fourargunent function (aonO); and the second function-defining branch defines
a two-argument function (aorf ).
New situations arise when parent B (figure 2L.2) is chosen to be the first
(contribufus) parent while parent C (figure 21.3) is chosen to be the second
(receiving) parent.
If either point 21L (labeled anp0) from ADF 1 of contributing parent B or
point 219 (a1so labeled ADFO) from ADF2 of parent B is chosen as a crossover point of the contributing parent, then the crossover fragment contains a reference to a three-argument ADFO. No point in receiving parent C
is eligible to be chosen as the point of insertion for this fragment because
ADFO takes four arguments in parent C and the occurrences of apr0 in
these crossover fragments from parent B take only three arguments. That
is, the fourth condition above would be violated by either of these choices.
The same applies to points 207 and 215. Note that this ineligibility arises
only in relation to a particular receiving parent; these same points are eligible to be crossover points when the anrO belonging to the receiving
parent takes three arguments.
Similarly, points such as 227 (labeled ADF O ) and 225 (labeled AND) from the
result-producing branch of parent B cannot find a home in parent C because
a crossover fragment rooted at those points would contain a three-argument
reference to ADFO.
Point typing enables genetic recombination to occur in an architecturally
diverse population. The architecture of an offspring produced by crossover is
always the same as the architecture of the receiving parent participating in
the crossover. However, it is possible that an individual with an architecture
appropriate for solving the problem and bodies that actually solve the problem can emerge during a run involving such a architecfurally diverse population. We now demonstrate that this potential for the simultaneous evolution
of the architecturewhile solvinga problemcanbe realized in connectionwith
the Boolean even-pafity problems of orders fle, fow, and three.
539 Evolutionary Selection of the Architecture of the program
2't.3 Results for the Even-FParity Problem
We made 4L runs of the even-S-parity problem using the evolutionary method
of determining the architecture of the overall program. Twenty-six of these
runs (64%) produced a l00%-correct solution (scoring 32 hits) by generation
50, thus demonstrating that is indeed possible to solve this problem without
prespecifying the architecture.
All of the 26 solutions employed one or more automatically defined functions even though it is, of course, possible to solve this problem without automatically defined functions.
Table 2L.3 Diskibution of architectures of the ADFs of 26 solutions to the evenS-pafity problem with evolution of architecture.
Run Argument map of ADFs
number
1,
2
3
4
5
6
7
8
9
10
11
12
13
1.4
15
t6
1 7
18
19
20
21.
22
23
24
25
26
1
1
2
2
2
2
2
2
2
4
J
3
3
3
3
a
J
a
J
4
4
4
4
4
4
4
5
5
5
{2}
{3}
{3,3}
{3,5}
{3,5}
{.4,21
l:4,31
14,4l
{5,3}
{.2,1,,31
{2,4,31
{3,"1.,41
{4,3,51
{.5,2,2l
{5,5, U
{5,5,21
.2,4,3,5j
13,2,2,31
{3, 4,5,51
|t4,3,1,1.1
15,3,3,51
[5,4,2,"1.1
{5,5,2,51
{2,4,5,'1,,51
[3,1,2,2,2]
{3,2,3,3,31
Generation
when solved
19
16
27
5
10
9
1,8
23
a o
JJ
13
18
37
16
7 7
9
10
13
40
79
9
79
9
13
22
7
20
540 Chapter 21
Table 21.3 shows the distribution of the number of automatically defined
functions and the number of arguments they eachpossess among 26 solutions
to the even-S-parify problem. The average number of automatically defined
functions for these 26 solutions is 3.08.
There is considerable variation among these 26 solutions as to their nrrnber of automatically defined functions and the number of arguments they
each Possess. Lr fact, only one of the argument maps for the automatically
defined functions of these 26 solutions is repeated.
Lr additioru an examination of the 45L best-of-generation individuals among
these 26 successful runs showed that every one of these best intermediate
programs employed automatically defined functions. hr other words, when
the number of automatically defined functions is open to evolutionary determinatiory all of the solutions and all of the best-of-generation programs produced along the way employ automatically defined functions. The reason for
this is apparently thatprograms lacking automatically defined functions tend,
on average, to be at a selective disadvantage throughout the run because their
fibress, at any given generatiory tends to lag that of competitors with automatically defined functions. Consequently, the programs lacking automatically defined functions are crowded out by the selective pressure exerted by
the competitive evolutionary process throughout the run. This observation
adds further support to main point 3 that automatically defined functions
improve the perfolTnernce of genetic programming.
The average strucfural complexity, S.rth,of solutions from the 26 successful runs (out of 41 runs) of the even-S-parity problem with evolution of architecture is L50.L points.
Figure 21.4 presents the performance cnrves based on the 41 runs of the
even-S-parity function with evolution of architecture. The cumulative probability of success , P(M,i),is14hby generation 23 and64o/"by generation 50.
23 E = 576.000
(23,54vo) I
\i
r rF\--^^
,J
?-
(50,647o)
3.000.000
-
v
0)
a
a
q)
I
fr A .
-
(u
rFa
(n
cq
E.
-
.-
-
-
I
-
^ 100 6.000.000
!\-
(t) (n
q)
I
I
t
(n
tsso
I
.-
-
o!l
-
c!
e\ -
L
||,
-
0
(5,2Va) 25
Generation
Figure 21.4 Performance curves for the even-S-parity problem with evolution of architecture
showing that E*i,p =576,000 withADFs.
541 Evolutionary Selection of the Architecture of the Program
The two numbers in the oval indicate that if this problem is run through to
generation?S,processingatotal of Eru, -576,000 (i.e.,4,000x24generations
x 6 runs) individuals is sufficient to yield a solution to this problem with99%
probability.
The 576,000 individuals required to yield a solution to the even-S-parity
problem with 99%probability with the evolution of the architecture is about
twice the 272,000 individuals shown in table 7.2 to be required with either
two two-argurnent automatically defined functions or three three-argument
automatically defined functions. Although not as low as the optimal number
of 272,000, t]".:re 576,000 individuals is smaller than four of the L2 numbers in
table 7.2.It is only 63% of 912,000, the worst number tntable7.2.
However, because of point Vpirg, implementation of the evolutionary
method of determining the architecture takes considerably more wallclock
time (aboutthreetimesasmuchforthisproblem). Thus, simtrltaneouslyevolving the architecture alongwith solving the even-S-parityproblemtakes about
six times as much wallclock time as the evolution of the solution alone when
compared to the optimal value of 272,000 in table 7.2. Simultaneously evolvingthe architecture alongwith solvingthe problemtakes abouttwice as much
wallclock time when compared to the worst value tntable7.2.
An examination of seven of the 26 successful runs illustrates several points
about the evolution of architecfure.
We first examine the run that yields a solution on the earliest generation
(row 4 in table 2I.3).
Generation 0 consists of randomly generated programs. The distribution
of the number of automatically defined functions among the 4,000 programs
of generation 0 of this run is, as expected, reasonably flat. There are760 programs with no automatically defined functions,646 with one,652with two,
652 with three, 658 with four, and 632with five arguments.
The best of generation 0 scores L8 hits and contains two automatically defined functions. ADFO takes four arguments and anpl takes three arguments
so this program has an argument map for its ADFs of {4,31.
(progin (defun ADF0 (ARG0 ARGI- ARG2 ARG3)
(values (NAND (OR (NAND ARG3 ARGI) ARGI) (AND (NAND
ARGO ARG2) (eNo ARG2 ARGI) ) ) ) )
(defun ADFI (ARG0 ARG1 ARG2)
(values (NOR (ADFO (AND ARG2 ARGI) (xOn ARGO ARGI)
(ADFO ARG1 ARG2 ARG0 ARGO) (UOn ARGO ARGO)) (NOR ARG1
ARG2) ) ) )
(values (ADF0 (ADF1 (OR D1 D3) (ADFI- D1 D4 D1) (NAND D0
D2 ) ) (NAND (NAND D4 DO ) (NAND Dl D4 ) ) (NAND (AND D3 D2 )
(OR DT D4)) (ADFI (NAND D2 DO) (OR D4 DO) (OR D2
D0))))) .
The best of generation program in generation 1 scores 19 hits. Both ADFO
and anpl take four arguments so this program has an argument map for its
ADFs of {4,4]r.
542 Chapter 21
(progn (defun ADFO (ARGO ARG1 ARG2 ARG3 )
(values (NAND (on 19p ARG3 ARG0) (On ARG2 ARG2)) (On
(NAND ARG0 ARG3 ) (NOn ARG1 ARGO ) ) ) ) )
(defun ADF1 (ARGO ARG1 ARG2 ARG3)
(VAIUCS (NOR (ON (AND ARG3 ARGI-) (AND ARGO ARGO) ) (ADFO
(NAND ARG2 ARG0) (on ARG2 ARG2) (anpo ARG1 ARGO ARG3
ARG2) (OR ARG2 ARGI) ) ) ) )
(values (NAND (ANo (on D2 D3) (ADFO D3 D4 DO D3) ) (OR
(AND D2 D3) (ADFo D4 Dl D0 Dl))))).
The best of generation program in generation 2 scores 20 hits and has three,
instead of only two, automatically defined functions. It has an argument map
for its ADFs of {3,4, 4l andis shown below:
(progn (defun ADF0 (ARG0 ARG1 ARG2)
(va]ues (ANo 1Na1,tD (OR ARG2 ARGI) (UOn ARG1 ARGI))
(NAND (NAND ARG1 ARGO ) (ANN ARG1 ARG1 ) ) ) ) )
(defun ADF1 (ARGO ARG1 ARG2 ARG3)
(VAfUCS (OR (OR (AND ARG1 ARGI) (ANN ARG' ARG2)) (NOR
(oR ARGO ARG3) (non (NAND ARG3 ARGI) (on ARGO
ARG2) ) ) ) ) )
(defun ADF2 (ARGO ARG1 ARG2 ARG3)
(VAIUCS (ADFO (ADF1 (AND ARG3 ARGI.) (NAM ARG2 ARGI)
(AND ARG2 ARG3 ) (XON ARG2 ARG3 ) ) (NAND (ADFO ARG3
ARG2 ARG0) (aor.r ARG3 ARG2 ARG2 ARG0)) (ADF0 (oR ARGO
ARGI) (OR ARG2 ARG3) (ADF0 ARG0 ARG0 ARGI) ) ) ) )
(values (ADF2 (ANo 1Na\TD D3 D3) (NOR D4 D1)) (AND (NOR D2
D2) (ADF1 D4 D4 D0 D3 ) ) (NOR (AND D2 D2) (aor.z D2 DO D4
D3)) (NAND (ANDD1 D1) (NANDD3D2))))).
The best of generation 3 is a different program that also scores 20 hits and has the same argurnent map for its ADFs.
The best of generation 4 scores 26 hits and has an argument map for its ADFs of {3,5}.
The problem is solved on generation 5 of this run with the following program with an argurnent map for its ADFs of {3, 5}:
(progn (defun ADFO (ARGO ARG1 ARG2 )
(var-ues (NoR (NoR (AND ARG2 ARGI) (NoR ARGO ARG2)) (AND
(NOR ARGO ARG2 ) (on ARG2 ARGI) ) ) ) )
(defun ADF1 (ARG0 ARG1 ARG2 ARG3 ARG4)
(VAIUCS (NOR (AND ARGO ARG2) (UON (AND ARG3 ARG2) (OR
ARGOARG2)))))
(vatues (ADF0 (NoR D0 D4) (ADF1 (ADFO D3 Dl D3) (oR Dl
D1) (ADF1 D4 D0 D2 D1 D3 ) (NAND DO D3 ) (ADFO D1 D4 D4 ) )
(AND (NAND (On D3 D3) (NOR (NAND D2 D2) (on (NAND (AND
D2 D4) (On D1 D1) ) (oR 1X1a1qp D4D3) (NAND D0 D2)) ))) (NOR D0 D0))))) .
In this loO%-correct solutiory ADFO is three-argument Boolean rule 195 which performs the even-2-patity function on two of the three arguments available to it (ARG1 and ARG2). It is equivalent to
Evolutionary Selection of the Architecture of the program 543
ADFz
Figure 21".5 Argument trajectory of the number of arguments in ADFs for the best-of-generation programs between generations 0 and 5 of the {3, 5} run of the even-S-parity problem with
evolution of architecfure.
(EVEN_2-PARITY ARG1 ARG2 ) .
ADFl is five-argument Boolean rule 1,437,226,410 and is equivalent to
(oDD-2 -PARrrY ARGO ARG3 ) .
Rule 1,437,226,410 pefiorms the odd-2-parity function on two of the five
arguments available to it (anco and anc:).
Thus, this lO0%-correct solution is a composition of two parity functions of
order two (one odd-parity and one even-parity).
Figure 2L.5 is a three-dimensional trajectory called t}re argumentrajectory,
forthis runshowi^g,by generation, thenumberof arguments of ADFO, ADFI-,
and anr'2 (if present) of the best-of-generation programs. It is possible to
visualize this trajectory in three dimensions only because the number of
automatically defined functions for the best-of-generation program does not
exceed three for this particular run. As can be seen, the trajectory begins on
the floor of the three-dimensional region at generation 0 with a best-of-generation program that has only two automatically defined functions and an
argument map for its ADFs of [4,3]. The trajectory is on the floor of the threedimensional region because the best of generation 0 does not possess an ADF2 .
The trajectory continues along the floor in generation L, but rises from the
floor in generations 2 and 3 when ADF2 is present. The trajectory falls back to
the floor with {3, 5} in generations 4 artd 5. We call this run the "13,51rul:r"
544 Chapter 21
Generation
Figure 21.6 Fitness-branch traiectory showing the number of ADFs and hits for the best
of generations 0 through 5 of the {3, 5} run of the even-S-parity problem with evolution of
architecfure.
because {3,5} is the argurnent map for the ADFs of the l00%-correct best-ofrun individual on the final generation of the run.
Figure 21.6 shows the trajectory, by generation, of the best-of-generation
programs according to their raw fitrness (hits) and the number of automatically defined functions for the [3, 5] run. This type of figure is called afitnessbranchtrajectory.
Figure 21.7 is a three-dimensional histogram (called the brnnchhistogram)
showing, by generatiory the number of Programs in the population of 4,000
of the {3, 5} run with a specified number (from 0 to 5) of automatically defined
functions. For generation 0 there is approximate equalify in the number of
programs in the populationwith zero, one, two, three, Iour, or five automatically defined functions. Howeveq, by generation 5, the number of programs
with no automatically defined functions drops from an initial value o1760 at
generation 0 to only 200 at generation 5. Similarly, the number of programs
with only one automatically defined function drops from 646to only 197 , and
the number of programs with four automatically defined functions decreases
from 658 to 180. By generation 5, most of the programs have two, *tree or five
automatically defined functions.
This run is unique among lhe26 successful runs in that it is the only run
where the programs with no automatically defined functions did not become
extinct in the population before a solution was found. Flowever, even in this
run, only 5% of the individuals in the population had no automatically
545 Evolutionary Selection of the Architecture of the Program
h
I
q)
c)
ti
1,600
1,400
r,2m
1.000
Generation
Figure 21.7 Branch histogram between generations 0 and 5 showing the number of programs
in the population with various numbers of ADFs for the {3,5} run of the even-S-parity problem
with evolution of architecture.
defined functions by generation 5. The poor performance of programs
with no automatically defined functions (relative to the ptogtu*s with
automatically defined functions) causes their near-extinction in the competitive envirorunent in which evolution is being used to determine the architecture of the overall program while the solution to the problem is being found.
The population sometimes converges to a particular configuration of automatically defined functions and number of arguments. The {3} run (raw 2 of
table 2I.3) produces a solution on generation 16. Starting with generation 3,
the best-of-generation programs all consist of one three-argut"u"t automatically defined function.
Figure 21.8 shows the three-dimensional trajectory by generatiory of the
number of arguments of ADFO and anpl of the best-of-generation programs
for the {3} run. Since none of the best-of-generation programs in this run
employ ADF2, the trajectory begins on the floor of the three-dimensional
region at generation 0 with a best-of-generation program having an argument map for its ADFs of 14,3), moves to the point {3} on the ADFO axis on
generation 1, refums to {4,3} for generation 2, returns to {3} on generattonS,
and then stays there until the problem is solved at generation 16.
546 Chapter 21.
Figure 21.8 Argument trajectory between generations 0 and L6 for the {3} run of the evenS-parity problem with evolution of architecture.
Generation
Figure 21.9 Fitness-branch trajectory between generations 0 and L6 of the {3} run of the evenS-parity problem with evolution of architecture.
Evolutionary Selection of the Architecture of the Program
4,m0
3,5m
3,000
2,500
15m
1.000
2
ADFs
Figure 21.10 Branch histogram between generations 0 and 16 for the {3} run of the even-Sparity problem.
Chapter 21
Hits
19
Generation
Figure 21.11 Fitness-branch trajectory between generations 0 and 19 of the {2} run of the evenS-parity problem with evolution of architecture.
Figure 21.9 shows the trajectory, by generation, of the best-of-generation
programs according to their raw fihless (hits) and the number of automatically defined ftrnctions for the {3} run.
Figure 21.10 is a three-dimensional histogram for the {3} run showing, by
generation, the number of programs in the population of 4,000 with a speciiled number (from 0 to 5) of automatically defined functions. As can be seen,
programs with one automatically defined function quickly start to dominate
the population. By generations L5 and 1,6,100"h of the population has one
automatically defined function.
This run produced the following solution on generation L6:
/nrnrrn {defrrn ADF0 (ARGO ARG1 ARG2)
\y!v:y'rr \uv!qfr \.*\vv -r^rvJ
(values (NOR (AND (NAND (NAND (NOR (NAND (NAND (NAND
ARGO ARGO ) (NAND (NOR ARG2 ARGO ) (MON ARGO ARGO ) ) )
(NOR ARGI- ARGI)) (NOR ARG0 ARGO)) (NOR ARGI- ARGI))
(NOR ARG1 ARGI)) (AND (NOR ARGO ARGO) (UOn ARGO
ARG2 ) ) ) (NOR (AND (NOR ARG1 ARG2 ) (UOn ARGO ARG2 ) )
(AND (NAND (AND ARGO ARG2 ) ARG1 ) (NAND (NAND ARG2
ARGO) (NOR ARGI. ARGI)))))))
(values (ADFO (OR Dl D4) (NAND (ADFO (ADFO D2 Dl D0) (NAND
D4 Dl) (On D1 D3)) (NAND (ADFO D2 D]- D0) (AND D3 D0)))
(AND (OR (NAND (NAND D3 D21 O0) (NAND D4 D1)) (AND (OR D1
D4) (AND D3 D1))))) ) .
In this 1O0%-correct solution, the one automatically defined function ADFO
performs the even-3-Parlty function (Boolean rule 105),
549 Evolutionary Selection of the Architecture of the Program
Generation
I
9
10
I
3,500
3,000
2,5m
2,000
1,500
1,000
500
>.
I
c)
I
'u 3
o1-ADFs
Figure 21'L2 Branch histogram between generations 0 and 19 for the {2} run of the even- S-parity problem.
550 Chapter 21
Generation
Figure 21.13 Fifiiress-branch trajectory between generations 0 and 20 of the 13,2,3,3, 3) run of
the even-S-parity problem with evolution of architecture'
(EVE}J-3_PARITY ARGO ARG1 ARG2),
Thus, this solution to the even-S-parity problem is built up from the even3-parlty function.
Lr the {2} run (row 1 in table 2I.3), the population became dominated by
programs with one and four automatically defined functions.
Figure 2l.ll shows the trajectory, by generation, of the best-of-generation
programs according to their raw fibress ftits) and the number of automatically defined functions for the {2} run.
Figure 21..12is a tfuee-dimensional histogram for the {2} run shown9,by
generation, the nurnber of programs in the population with a specified number (from 0 to 5) of automatically defined functions. As can be seery by generation 12,100"/' of the population has converged to programs with either
one-argument or four-argument automatically defined functions. The number of programs in the populafion with only one automatically defined function is very small at generatronl2;howeveq, these programs proliferate and
the solution to the problem at generation 19 turns out to have only automatically defined function.
The best of generations 6,7,8,9,10,1"6, and L7 have an argument map of
their ADFs of {1, 3,3, 4I and the best of all other generations have an argument map of their ADFs of {2}.
The following 100%-correct program with an argument map for its ADFs
of {2lemerged on generation t9:
551 Evolutionary Selection of the Architecture of the Program
/^r^ffi /z{aF'.r lnDA / ^D^n nD^1 \
\y!vvrr \\rErLrrr ADF0 (ARGO ARGI)
(values (NOR (AND (AND (AND ARG1 ARGO ) ARGO ) ARG1 ) (AND
(AND (NAND ARG1 ARGI) (NAND ARGO ARGO)) (NAND (OR ARG1
ARGO) (UOn ARG1 (NOR ARG1 ARGO)))))))
(values (ADFO (NAND (OR D2 Dl) (ADF0 D4 (ADFO D3 D0))) (Awo
(NAND D1- D2) (ON (OR D2 D1) (ADFO D4 (ADFO D3 DO)) )))) )
ADF0 of this 1O0%-correct program performs
(oDD-2-PARrrY ARG0 ARG1 ),
which is the two-argument Boolean rule 6. This solution to the even-S-parity
problem is composed of four invocations of the odd-2-parity function.
hr the {3,2,3,3,31run (row 26 oftable 21.3),the population quicklybecame
dominated by programs with one and five automatically defined functions.
Figure 21'.13 shows the trajectory,by generation, of the best-of-generation
programs according to their raw fitness (hits) and the number of automatically defined functions. This zigzagg;ngtrajectory shows that programs with
one and five automatically defined functions arebattlingitoutfor supremacy
within this run.
Figure 21,.1,4is a three-dimensional histogram for the {3, 2,3,3,3} run showirg, by generation, the number of programs in the population with a specified number (from 0 to 5) of automatically defined functions. Programs with
five automatically defined functions are almost extinct in generation 6. The
battle for supremacy shown in figure 21.13 is reflected here by the dominance
of the Programs with either one or five automatically defined functions in the
later generations of this run As can be seen, 100% of the population in generation 20 has either one or five automatically defined functions.
The best of generations 0,1,3,4,10, and LL have an argument map for their
ADFs of {a} and the best of all other generations have an argument map for
their ADFs of {3,2,3,3,31.
The following lO0%-correct program with an argument map for its ADFs
of {3, 2,3,3,3} emerged on generation 20:
(progn (defun ADF0 (ARG0 ARG1 ARG2 )
(values (OR (NOR (AND (NAND ARG1 ARGO) (on ARGO ARGI))
(NAND (OR ARG2 ARG2) (NOR ARGO ARGO))) (OR (NOR (NAND
ARG1 ARGO) (WON ARG2 ARGI)) (AND (NOR ARGO ARGO) (AND
ARGOARGI))))))
(defun ADF1 (ARGO ARG1)
(values (OR (ADFO (oR (NOR ARGI- ARGI) (mNn ARG1 ARGI))
(AND ARGO ARGI-) (ANO ARGO ARGI)) (ADFO (ADFO (AND ARG1
ARGO) (ON ARG1 ARGI) (AND ARGO ARGI)) (ADFO (AND ARGO
ARGI) (NAND ARG1 ARGO) (NON ARG]- ARGI)) (OR (AND ARGO
ARGI) (on ARG0 ARGI) ) ) ) ) )
(defun ADF2 (ARG0 ARGI- ARG2 )
(values (NAND (NAND (NOR (OR ARGO ARG2 ) (ADFO ARG1 ARG2
ARGO)) (AND (ADFI ARG1 ARGI) (ON (NAND ARGO ARGI) (ATIO
(NOR ARGO ARGI_) ARGI)))) (NAND (NOR (OR ARGI- ARGO)
Chapter 21
2,500
2,000
1,5m
1,m0
I
ADFs
Figure 2L.14 Branch histogram between generations 0 and 20 for the {3, 2,3,3,3} mn of the
even-S-parity problem
9
ql
P
553 Evolutionary Selection of the Architecture of the Program
554
(ADFO ARG2 ARG2 ARGI)) (ADF1 (NAND ARG1 ARG2) IANN
ARG2ARGO))))))
(defun ADF3 (ARG0 ARG1 ARG2)
(values (ADFO (OR (On (NAND ARG1 ARG2 ) (wOn ARG0 ARG2 ) )
(OR (OR ARG1 (NAND ARGO ARGI)) (AND ARGO ARGI))) (NOR
(ADF1 (AND ARGO ARGO) (NOR ARG2 ARGI)) (OR (NOR ARG1
ARGI) (ADF2 ARG2 ARGO ARGO))) (ADF1 (AND (ADF2 ARGO
ARGO ARGO ) (AUO ARGO ARGO ) ) (ADF1 (AND ARG1 ARG2 )
(ADF2 ARGI ARG2 ARG2) ) ) ) ) )
(defun ADF4 (ARGO ARG1 ARG2)
(VAIUCS (ON 1gP (ADFO (AND ARG2 ARGO) (ADF3 ARG1 ARGO
ARGO ) (ADFO ARG2 ARGO ARG2 ) ) (ADF2 (ADFO ARG2 ARG2
ARG2) (AND ARG1 ARGI) (On ARG2 ARGO))) (OR (ADFO (ADFO
ARGO ARG1 ARGO) (NAND ARG2 ARG2) (ADF3 ARGO ARG1
ARGI)) (OR (ADF]- ARG2 ARG2) (ADF2 ARG2 (NAND ARG1
ARGO) ARGI))))))
(values (ADF4 (ADF4 (ADF0 (NAND D4 D0 ) (namn D0 D3 ) (NoR D3
D3)) (AND Dl Dl) (ADF1 (ADF2 D2 D2 DO) (NAND D4 D0)))
(ADFO (ADF4 (NOR D3 D4) (NAND D2 DO) (WON D3 D2)) (AND
(NoR D3 D3) (ADF1 D2 D3) ) (OR (ADF3 D0 DO D2) (oR D0
D4))) (NAND (AND (NAND D2 D3) (NON D3 D3)) (NOR (OR D4
D0) (oRD2D2)))))).
In this solution, ADF0 is equivalent to three-argument Boolean rule L52;
ADF1 is equivalent to (Oon- 2 - pARrTy ARGO ARG1 ) ; ADF2 is equiva_
lent to three-argument Boolean rule 1.; ADF3 is equivalent to three-argument Boolean rule 64; and etr3 is equivalent to ( EVEN- 3 - pARrry ARGO
ARGI- ARG2 ) . That is, two of the five automatically defined functions
here are parity rules and three of the five automatically defined functions
are not.
In the {4,21 run (row 6 of table zl.g), the population quickly becomes
dominated by programs with one and two automatically defined functions, although a few programs with four and five automatically defined
functions remain.
Figure 2r.t5 is a three-dimensional trajectory showing, by generation,
the number of arguments of anro and anp t of the best-of-generation programs for the It4, 2I run. As can be seen, the trajectory begins on the floor
of the three-dimensional region at generations 0 and L with a best-of-generation program having an argument map for its ADFs of {4,2}, moves
along the floor to {2,5} on generation 2, returns to {4, zl for generation 3
and then stays there until the problem is solved at generation 9.
Figure 2L.1'6 shows the trajectory, by generatiory of the best-of-generation programs according to their raw fitness (hits) and the number of
automatically defined functions for the {4,21 run.
Figure 2L.17 is a three-dimensional histogram for the {4,21run showing, by generation, the number of programs in the population with a specified number (from 0 to 5) of automatically defined functions.
Chapter 21
ADFz
Figure 21.1,5 Argument hajectorybetween generations 0 and 9 of the {4,2} run of the evenS-parity problem with evolution of architecture.
Figure 21.15 Fitness-branch trajectory between generations 0 and 9 of the {4,21 run of the
even-S-parity problem with evolution of architecture.
555 Evolutionary Selection of the Architecture of the Program
h
I
€)
q)
lr
f-, -
3,000
2,5n
2.ffio
1,500
1,000
Figure 21.17 Branch histogram between generations 0 and 9 for the {4,2} n:n of the evenS-parity problem.
The best of generation 2 has an argument map for its ADFs of {2,51.
The best of all other generations have argument maps for their ADFs of
{4,21.
The following 1007o-correct program with an argument map for its ADFs
of 14,2) emerged on generation 9:
(progn (defun ADFO (ARGO ARGI- ARG2 ARG3 )
(values (OR (AND (NOR (NAND (OR (OR (AND ARG2 ARG2 ) (NoR
ARG3 ARG2 ) ) (NOR ARG1 ARG3 ) ) (NAND (NAND ARGI- ARG3 )
(AND ARG1 ARGI))) (AND (NOR (AND ARGO ARGI) (ON ARG3
ARG3 ) ) (OR (AND ARG2 ARG2 ) (NANN ARG3 ARG]-) ) ) ) ARGI)
(OR (NOR ARG3 ARG2) (NOR ARG1 ARGO)))) )
(defun ADF1 (ARGO ARG1)
(values (AND (NANp (i\ND ARG1 ARGI_) (ann ARG1 ARG0)) (oR
(OR ARG1 ARGO) (ON ARG1 ARGO) )) ) )
(values (ADFI (ADFO (OR D0 D1) (NOR D3 D3) (NAND D3 D0)
(ADF1 D1 D0)) (ADF1 (AND D4 D4) (NAND D2 D2) )) ) ) .
ADFO is equivalent to four-argument Boolean rule 53,535 and ADF1 is
equivalent to ( ODD-2 -PARITY ARGO ARG1 ) . In other words, this solution
556 Chapter 21
Figure 2L.18 Fitness-branch trajectory between generations 0 and 40 of the t.3,2,2,3) run of
the even-S-parity problem with evolution of architecture.
4,000
3,500
3,000
2,500
2,000
1,500
1,000
Generation 500
0
^ 4
J
ADFs
h
I
(9
q)
l.r
- 5 \
6
7-4c,
0 1
Figure 21.19 Branch histogram between generations 0 and 40 for the {3, 2,2,3} run of the
even-S-parity problem.
Evolutionary Selection of the Architecture of the Program
Generation
Figute2L.20 Fitness-branchtrajectorybetweengenerations 0and22of the 12,4,5,1,5)runof
the even-S-parity problem with evolution of architecture.
is built up from one lower-order parLty function and one Boolean function
that is not a parity rule.
In the {3,2,2,31run (row 18 of table z1.g), the population quickry converged comPletely to programs with four autornatically defined functions.
Figure 21'.T8 shows the trajectory, by generation, of the best-of-generation programs according to their raw fihress (hits) and the number of automatically defined functions. The best of generation 40 has an argument
map for its ADFs of {3, 2,2,31.
Figure 2r.19 is a three-dimensional histogram for the {3, z,z,g} run showing,by generation, the number of programs in the population with a specified number (from 0 to 5) of automatically defined functions. By generation
7, all4,000 Programs in the population have four automatically defined
functions.
The following 100%-correct program with an argument map for its ADFs
of {3, 2,2,3]1emerged on generation 40:
(progn (defun ADFO (ARG0 ARG1 ARG2 )
(values (OR (OR (NAND (NAND ARGO ARGI) (NOR ARGI_ ARGO))
(AMN IAND ARG1 ARG]-) (NOR ARG1 ARG2))) (NOR (AND (OR
ARGO ARG2) (ON (NAND ARG1 (NAND ARGO ARGI)) (OR ARGO
ARG].))) (OR (NAND ARGO ARGI) (XON ARGO ARGO))))))
(defun ADF1 (ARGO ARG1)
(values (OR (NAND (OR (ADFO ARG1 ARGO ARG1 ) (AM ARGO
ARGI)) (NAND (AND ARGO ARG]-) (ADFO ARGO ARGO ARGI)))
(AND (OR (ADFO ARG1 ARG]- ARGI) (ADFO ARGO ARG1 ARGO))
(NOR (NAND ARG]- ARGI) (NOR ARGO ARGO) ) ))))
558 Chapter 21
I
0)
Figure 21.21 Branch histogram between generations 0 and 22 for the {2, 4,5, 1.,5} run of the
even-S-parity problem.
(defun ADF2 (ARGO ARG1)
(values (ADF1 (AND (AND (ADF1
ARGO)) ARGI-) (ADFO (NOR (OR
(NOR (NAND ARG1 ARGO) (ADFO
ARGO)) (NaNp ARGI_ ARGO)))))
(defun ADF3 (ARG0 ARGI ARG2)
(values (NAND (NAND (NAND (NAND ARG1 ARGO) (ual[D ARG0
ARGI) ) (ADF1 (ADF2 ARG2 ARGI) (ADFI- ARG]- ARG2)) ) (ADF2
(oR (NOR (AND (NAND ARGO ARGI) (UeUo ARG1 ARGO)) (OR
(NOR ARG1 ARGO) (NOR ARG1 ARGO))) (NAND (OR ARGO ARG2)
(NOR ARGO ARG0))) (ADFO (NAND ARG1 ARGO) (aOrr ARG2
ARGO) (ADF]- ARG2 ARG2) )) ) ) )
(values (ADFI- (NOR (AND D0 D3 ) (NOn D1 D3 ) ) (ADFI- (ADFI-
(NOR D2 D4) (OR D3 D0)) (ADF3 (NAND D2 D4) (On D2 D3)
(ADF0 (NOR D]- D3) D1 D3)))))).
ADFO is equivalent to three-argument Boolean rule 238; ADF1 is equivalent
to (EVEN-2-PARITY ARGO ARG1); ADF2 isequivalentto (ODD-2-PARITY ARGO ARG1 ); and ADF3 is equivalent to three-argument Boolean rule
ARG]- ARGO ) (AND ARGO
ARG0 ARGI-) (On ARGO ARGO))
ARGI. (NAND ARG1 ARG]-)
559 Evolutionary Selection of the Architecture of the Program
1'67.Thrs solution is composed of two parity functions of order two (one odd
and one even) and two other Boolean functions that are not parity rules.
The solution produced by the {2,4,s,1,5} run (row 24 oftable zl.g)is one of
onlythree solutions (of the 26 solutions) whereno automaticallydefined function is a lower-order parity function.
Figure 21.20 shows the hajectory,by generatiory of the best-of-generation
Programs according to their raw fihress (hits) and the number of automatically defined functions for the {2,4,5, 1,5} run
Figure 2r.21is a three-dimensional histogram for the {2,4,5, 1,5} run
showing, by generation, the number of programs in the population with a
specified number (from 0 to 5) of automatically defined functions. The
best of generation 0 of tk.e [2,4,5, r,5] run has an argument map for its
ADFs of {a}. The best of several early generations have an argument map
for their ADFs of [3,2] . Howeve4, starting at generation 9, all 4,}}}programs
in the population converge to an argument map for ADFs of {2,4, s, r, s|
involving five autom attcally defined functions.
The following 10O%-correct program with €u:r argument map for its ADFs
of {2,4,5,1-,5} emerges on generatton22:
(progn (defun ADF0 (ARG0 ARG1)
(values (NOR (AX1P (AND ARG1 ARGI) (uOn ARG1 ARGI)) (AND
(NOR ARG1 ARGI-) (NOR ARG1 ARGI) ) ) ) )
(defun ADF1 (ARGO ARG1 ARG2 ARG3)
(values (oR (NAND (OR ARGO ARG3) (NAND ARG0 ARG2)) (NOR
(NAND ARG1 ARG3 ) (NaUO ARGO ARGo ) ) ) ) )
(defun ADF2 (ARG0 ARG1 ARG2 ARG3 ARG4 )
(values (NAND (OR (AND ARGO ARGI) (WENN ARG3 ARGI))
(ADFO (NOR ARG2 (NAND (NAND ARG1 ARGO) ARGO)) (OR (AND
(OR (OR ARGO ARG2) ARGI) (NAND (OR ARGO ARG2) (UAMO
ARG2 ARGO))) (NAND (NAND ARG1 (OR (NOR ARGI ARGI)
ARGO)) ARcO))))))
(defun ADF3 (ARGO )
(VAIUCS (ADF2 (ADF2 (NOR ARGO ARGO) (NAND ARGO ARGO)
(ADF0 ARGO ARG0) (aNn ARGO ARG0) (mnO ARGO ARG0))
(AND (AND ARGO ARGO) (AOPZ ARGO ARGO ARGO ARGO ARGO) )
(ADFO (AND ARG0 ARGO) (On ARGO ARGO)) (OR (ADF2 ARGO
ARGO ARGO ARGO ARGO) (NON (AND ARGO ARGO) ARGO)) (OR
(ADFO ARGO ARGO) (ADFI ARGO ARGO ARGO ARGO) ) ) ) )
(defun ADF4 (ARGO ARG1 ARG2 ARG3 ARG4)
(values (OR 151A\ID (NAND ARGO ARGO) (NAND ARGO ARG2))
(NAND (OR ARGO ARG3) (NANO ARGO ARG2)))))
(values (ADF1 (ADFI_ (ADF1 D3 D3 D1- D2) (aNo Dl- Dt) (AND D1
n2) (ADF1 D2 D3 Dr D3)) (OR D0 (NAND D4 D0)) (ADF2 (NOR
D3 D4 ) (ADF2 D1 D4 D3 D3 D2 ) (ADFO D0 D2 ) (XOn Dl D3 )
(NAND (NAND (OR D3 D0) (OR D2 D3)) (OR D2 D3))) (ADFO
(ADFO D0 D1) (ADF2 D0 D4 D4 D4 D3 ) ) ) ) ) .
ADFO of this lO0%-correct program performs two-argument Boolean rule
L2; anrl performs four-argument Boolean ru1e43,253; eopZ performs five560 Chapter 21
argument Boolean rule'L,,!7 4,554,11.4; ADF 3 performs one-argument Boolean
rule 0 (always false); and anp4 performs five-argument Boolean rule
2,868,882,175.
Table 21.4 shows, in its first two columns, the run number and the argument map for the ADFs of the26 solutions to the even-5-parity problem. Each
pair of the 10 remaining columns relates to each of the five automatically
defined functions that may appear in a particular solution. The Boolean rule
number appears in the first column of each such pair. If the Boolean rule is a
parity rule, the second column identifies it. As can be seery only runs 7,22,
and24of these 26 successfr-rl runs solve the problem without using at least
one parity rule. That is, 88% of these runs invoke lower-order parity functions (of orders two and three). In contrast, orly 42o/" of the 19 solutions shown
in table 6.6 invoke lower-order parity functions.
21.4 Results for the even-4-Parity Problem
We made 25 runs of the even4parity problem to further test the evolutionary method of determining the number of automatically defined functions
and the number of arguments theyeachpossess. All of these runs produced a
l0O%-correct solution (scoring 16 hits).
All of the 25 solutions employed one or more automatically defined functions, even though it is, of course, possible to solve this problem without
automatically defined functions. An examination of the 258 best-of-generation individuals among these 25 successful runs showed that?Sfof these 258
intermediate best-of-generation prografils employed automatically defined
functions. That is, when the number of automatically defined functions is
open to evolutionary determination, almost all of the best-of-generation programs produced along the way employ automatically defined functions for
this problem.
As with the even-S-parity problem, there is no convergence of architecture
among the various mns of the even-4-parity problem. Lrstead, there is a wide
variation among the argument maps of these 25 overall programs (with only
two argument maps being repeated). The average number (3.08) of automatically defined functions for these 25 solutions happens to be the same as for
the even-S -parity problem.
Figure 21.22 shows the trajectory,by generation, of the best-of-generation
programs of the {2, 4,4,3,51run (row 22 of table 2I.5) of the even4-parity
problem according to their raw fibress (hits) and the number of automatically
defined functions.
Figure 2L.23 is a three-dimensional histogram for the {2, 4,4,3,5} run of the
even4parity problem showirg, by generatiory the number of programs in
the population of 4,000 with a specified number (from 0 to 5) of automatically
defined functions. As canbe seen, programs with five automatically defined
functions quickly start to dominate the population and by generation 74,
almost all of the population has five automatically defined functions.
561 Evolutionary Selection of the Architecture of the Program
Run Argument Rule number Is ADFO a Rule number Is ADF1 a Rule number Is aor'2 a
map for ADFs foraopo parity rule? forADr'1 parity rule? foralr2 parity rule?
Table 2I.4 Characteristics of the ADFs of 26 solutions to the even-S-parity problem with evolution of architecture.
L {21 6 rcnn-z-PARrry
RULE?ARGO ARG1)
2 {3} 105 (EVEN-3-PARrrY
ARGO ARG1 ARG2 )
3 {3,3} rNo 150 (oDD-3-PARrrY
ARGO ARG1 ARG2 )
4 {3,5) r93 No 1,,515,870,8'1,0 (oDD-2-PARrrY
ARGO ARG2)
5 {3,5} 795 (EVEN-2-pARrry I,4g7,226410 (oDD-2-pARrry
ARG1 ARG2) ERGO ARG3)
6 {4,21 53,s35 No 6 (ooo-z-PARrrY
ARGO ARG1)
7 {14,31 94,952 No 147 No
8 {4,4} 26,214 (oDD-2-pARrry 5s2s3 No
ARGO ARG1)
9 {5,3} s6,03412s No 150 (oDD-3-PARrrY
ARGO ARG1 ARG2 )
10 12,1,3\ 9 (EVEN*2_PARTTY
ARGO ARG1)
sNo 102 (oDD-2-PARrrY
ARGO ARG1)
11 {2,4,3} 6 @no-z-PARrrY 7,395 No
ARGO ARG1)
150 (oDD-3-PARrrY
ARGO ARG1 ARG2)
12 13,1.,4l 165 (EVEN_2_PARITY
ARGO ARG2)
oNo 13,260 (oDD-2-PARrrY
ARG1 ARG3)
13 {4,3,51 26,214 (ODD-2-PARrrY
ARGO ARG1)
103 No 4,294,770,684 No
1.4 15,2,21 s47,B8s,zzo No 9 (nvnn-2-PARrrY
ARGO ARG1)
9 (EVEN-2-PARrrY
ARGO ARG1)
15 {5,5, L} 4,294,967,29s No 267,390,960 (oDD-2-PARrrY
ARG2 ARG3 )
zNo
16 15,5,21 2,947,s26,s7s No 3,2U,386,755 (EVEN-2-pARrrY
ARG1 ARG2)
15 No
17 {2,4,3,51 6 (olo-z -PARrrY 31.,354 No
ARGO ARG1)
is6 No
18 [3,2,2,31 238 No 9 (Bvrx-2-PARrrY
ARGO ARGI)
6 (ODD_2_PARITY
ARGO ARG1)
L9 13,4,5,51 63 No 43,605 (EVEN-2-pARrry 2,700,452,085 No
ARGO ARG3 )
20 {4,3,'J,,Il 1s,076 No t95 (EVEN-2-PARrrY
ARG1 ARG2 )
oNo
21, {5, 3, 3, 5} 't712,986,918 (oDD-2-pARrry
ARGO ARGI)
oNo 105 (EVEN-3-PARrrY
ARGO ARG1
ARG2 )
22 {5,4,2,11 r,MtAzo,T7B No 43,690 NO 11 No
23 15,5,2,51 2,779,620,781. No 2,779,768,800 No 12 No
24 {2,4,5,L51 rz No 43,2s3 No 1,174,554,114 No
25 [3,L,2,2,2I ss No oNo 9 (EVEN-2-PARrrY
ARGO ARGI)
26 {3,2,3,3,31 1sz No 6 (oon-z-PARrrY
ARGO ARGI}
rNo
562 Chapter 21
Run Is ADF3 a
parity rule?
Rule number
for ADF3
Is AnF4 a Rule number
parity rule? for enr4
10
11
t2
13
1,4
15
L6
17 818,884,815 No
18 167 No
T9 2,957,740,885 / rr rrnr a
ARGO ARG3)
20 No
21 809,250,876 No
22 No
23 2,857,740,885 ( EVEN-2.PARITY
ARGO ARG3 )
24 2,858,882,775 No
25 /rrTF1T a nnnTm\/
ARGO ARG1)
No
26 64 No 105 ( EVEN_3 _PARITY
ARGO ARG1 ARG2 )
563 Evolutionary Selection of the Architecture of the Program
Figare 21.22 Fitness-branch trajectory between generations 0 and 14 of the {2, 4, 4,3, 5} run of
the even4-parlty problem with evolution of architecture.
Figlre 21.23 Branch histogram between generations 0 and '1.4 for the {2,4,4,3,5} run of the
even-4-parity problem
Chapter 21
With Defined Functions
tl
q)
a
tnq)
cJ
tr
A .
-
q)
-
*)
0
cl
t
-
. I
.-
U
I
I
-
50
s
o
a
q)
I
cJ
t
-
rt)
CH
*a.-
.Fl
al
A
rI
lr
A ,
-
(s00,000
(50, 1007o)
750.000
(0,4Vo) 25
Generation
Figure27.24 Performance curves for the even-4-parity problem with evolution of architecture
showing that Erup = 120,000 with ADFs'
The average structural complexi$, S*irn, of the solutions of the even4-parity problem over 25 successful runs (out of 25 runs) is L30.9 points using
the evolutionary method of determining the architecture.
Figure zl,.z4presents the performance curves based on the 25 runs of the
even-&parrty problem with the evolutionary selection of architecture. The
cumulative probability of success, P(M,i), is 92%by generation L4 and is
100%by generation 50. The two numbers in the oval indicate that if this problem is run through to generationll,processing a total of E,ith = 120,000 individuals (i.e.,4,000 x 15 generations x 2 runs) is sufficient to yield a solution to
this problem with 99% probability.
For the purpose of comparing the above value of Eru, of 120,000 with the
evolution of architecture for the even-4-parity problem, we made three series
of runs with the same population sve of 4,000 without the evolution of architecture. For comparison, we made two series of runs with automatically
defined functions employing fixed argument maps for their ADFs of {3} and
{3,3l,respectively. We obtained values of E*r, of 76,000 and 80,000 for these
two series of runs. We then made an additional series of runs without automatically defined functions and obtained the value of Eru1,ou, of 276,000. Thus,
for the even-4-parity problem, the evolutionary method of determining the
architecture requires more computational effort, E, than the two particular
tested fixed argument maps, but less than if automatically defined functions
are not used at all.
The details of these three series of runs follow.
Figure 21.25 presents the performance curves based on 30 runs of the evenpanty problem using automatically defined functions with a fixed argument map for the ADFs of {3}. The cumulative probability of success , P(M,i) ,
is 100% by generation L8. The two numbers in the oval indicate that if this
l- p,M,il I
l+ I(M, i, z)l
I M = 4oool
I z=99%o I
| ft'?';' I
14 E = 120,000
565 Evolutionary Selection of the Architecture of the Program
-a
c) u) (A
c)
I
L
A -
-
q)
+)
u)
-
-a
-' v
.-
-
F4E
a
cn
o
I
I
!a
a
Crio)u
h
.-€
-
.-
A
-
A
-
L
A .
-
0) (n
a
()
I
L
A
-
6)
-.
€
U)
-
t
-
FI
o!l
-
-
-
1,000,000
hr
\
(50, 100%)
-. 1
o
U) q)
I
(J
t
a
CH
>>
+J .-
tr
A . E
With Defined Functions
500,000
Generation
Figure 21.25 Performance curves for the even-4-parity problem showing that E*ir; = 76,ffi0
with ADFs having a fixed argument map of {3}.
With Defined Functions
500.000
(2,3Vo) 25
Generation
Figure27.26 Performance curves for the even-4-parity problem showing that E*;rp = 80,000
withADFs having a fixed argument map of {3,3}.
problem is run through to generation L8, processing a total of. Ewith -76,000
individuals (i.e.,4,000 x 19 generations x 1 run) is sufficient to yield a solution
to this problem with 99"/" probability.
Figure 21,.26 presents the performance curves based on37 runs of the even4-parl$r problem using automatically defined functions with a fixed arryment map for the ADFs of {3, 3} . The cumulative probability of success , P (M , i) ,
is L00% by generation 19. The two numbers in the oval indicate that if this
566 Chapter 21
^ 1
. 6
(h(nq)
9
I
F.
-
a
+r
I
.!l
.-
-
L
A ,
-
ra
€)
a
CNq)
I
L
A ,
-
q)
-.
.Fa
0
-
-
-
' U
.-
IJ
ri
-
' U
o(n
a
q)
O
l.r
A , E
q)
-
t-)
rn
-
-l
!a
-
F.
.!l
.!l
-
U
F
-
'300.000
\
\
(50,Iffi%a)
^ 1
L\-
0
o
c)
I
I
t
-
(n
cts(
>r
.F)
.-
-
.-
cg
tr ||.
-l
Without Defined Functions
\
(50, 887o)
2,500,000
(13'3vo)
Generation
Figwe21.27 Performancecurvesfortheeven4parityproblemshowingthat Ewithout =276,0ffi
withoutADFs.
With Defined Functions
150,000
Generation
Figure 21.28 Performance curves for the even-3-parity problem with evolution of architecture
showing thal E.ir1, = 44,000 withADFg.
problem is run through to generation 19, processing a total of E.ith = 80,000
individuals (i.e.,4,000 x 20 generations x 1 run) is sufficient to yield a solution
to this problem with 99% probability.
Figure 2I.27 presents the performance curvesbased on40 runs of the evenA-parity problem without automatically defined functions. That is, the argument map for the ADFs is fixed and is the empty map {}. The cumulative
probability of success, P(M,i) ,isl}}%by generatton22. The two numbers in
5,000,000
(10,l$OVo)
(0,7.5Vo)
567 Evolutionary Selection of the Architecture of the Program
WristicsoftheADFsofthe25solutionstotheeven-4-parityprob1emwithevo1utionofarchitecture.
Run Argument Rulenumber IsADFO a
map forADFs for ADFo parity rule?
Rule number Is ADF1 a
for anpl
Rule number Is aDr2 a
parity rule? for ADF2 parity rule?
{21 (ODD-2 _PARITY
ARGO ARGI)
l4l 20,235 No
[2,5l No 4,027,576,335 (EVEN-2-pARrry
ARG2 ARG3)
12,51 (ODD-2 _PARITY
ARGO ARGI)
3,996,380,723 No
{4,41 40,975 No 4,573 No
14,41 35,779 No 17,488 No
{5,2} 511,,319,674 No 15 No
{5,4) 267,390,960 (oDD-2-pARrry
ARG2 ARG3 )
56,729 No
[5,5] 100,599,295 No 4,233,362,575 No
10 {1,L,4l No No
[. {1,,5,2} No L,437226,410 (oDD-2-pARrry
ARGO ARG3)
( EVEN-2 - PARITY
ARGO ARG1)
12 12,r,4l (ODD_2 -PARITY
ARGO ARG1)
No 65,535 No
13 .2,1.,51 ( ODD_2 _ PARITY
ARGO ARGI)
No 2,852,497,925 No
74 {2,3,21 ( EVEN-2 10 -PARITY
ARGO ARGI)
No No
15 12,4,41 ( EVEN_2 40 -PARITY
ARGO ARGI)
t7,822 No No
16 {i4,4,3| 61,030 No 38,058 17 No
17 [5,5,41 3,755,997,007 No 3,284,386,755 (EVEN-2 - pARrry
ARG1 ARG2)
No
18 (2,2,5,41 (ODD_2 _PARITY
ARGO ARG1)
( oDD- 2 - PARI TY 4,283,826,005
ARGO ARGI)
No
T9 {.3,4,4,31 No 57,054 No 7,501 No
20 13,4,5,31 155 No s3,110 No 3,539,718,907 No
21 11.,3,'J.,2,31 No 102 ( ODD-2 _PARITY
ARGO ARGI)
No
22 (2,4,4,3,51 No 13,090 No 11,835 No
23 12,4,5,5,21 (ODD-2 - PARITY
ARGO ARGI)
50,634 No 3,547,583,347 No
24 14,2,3,3,41 68 No ( EVEN_2 _PARITY 7U
ARGO ARG1)
No
25 {5, 3, 5, 5, 41 2,779,w6,48s( EVEN_2 _PARITY
ARGO ARG2)
181 No L,195,853,639 No
568 Chapter 21
IS ADF3 a
parity rule?
Rule number
for ADF3
IS ADF4 a
parity rule?
Rule number
for ADF4
10
11
t2
13
t4
15
L6
17
No
19 (ODD-2 - PARITY
ARGO ARG2 )
No
2I (ODD_2 _PARITY
ARGO ARG1)
(ODD_2'PARITY
ARG1 ARG2 )
1.63 3,423,718,417
2,863,3t1,530 (ODD_2 _PARITY
ARGO ARGI)
,24 51 No No
3,805,274,931 No 9,260 No
569 Evolutionary Selection of the Architecture of the Program
Thble 21.5 Characteristics of the ADFs of 40 solutions to the even-3 -parity problem with evolution of architecture.
Run Argument Rule number
map forADFs for ADFO
Is ADFO a
parity rule?
Rule number Is aopt a
for alp1 parity rule?
Rule number
for aor'2
Is aor2 a
parity rule?
al ( ODD-2 -PARITY
ARGO ARG1)
{3} 90 (ODD_2 _PARITY
ARGO ARG2 )
3
4
5
6
7
122 No
2,21
165 (EVEN-2 _PARITY ARGO ARG2 )
I,I79,010,630 No
J o (ODD-2 -PARITY
ARGO ARG1)
13,21 1t2 ( ODD_2 _ PARTTY
ARGO ARGI)
8
9
(ODD-2 -PARITY
ARGO ARG2 )
1
11
12
13
64,504
20,563 No
28,704 o
4,285,529,967 o
2,576,980,377
1 No
104 No
45 No
( EVEN_2 0 No _ PARITY
ARGO ARG1)
(ODD_2 _PARITY
ARG2 ARG3)
9 (EveN-z-PARrry 64 No 1,6,464 No
ARGO ARG1)
18 13,3,2) r% No 6 (onr_z-pARrry
ARGO ARGI)
19 13,4,21 oNo 24,415 No 9 (rvru-2-PARrry
ARGO ARG1)
3,4l
22 1.5,2,21 7,996,846,853 No No 6 (oon-z-PARrrY
ARGO ARG1)
23 [5,3,31 983,055 No
ARGO ARG2)
90 (oDD-2-PARrrY 238 No
?=4 19,9,!) ..
t,otr,ret,ros No ur No seu,erm
25 {1,1,,2,41 2 No 6 (oDD-2-pARrry
ARGO ARG1)
26 12,4,7,21 9 (svnN-2*PARrry 30,855 No eNo
ARGO ARG1)
28 13,2,2,21 64 No 9 (EVEN-2-pARrry f
29 13,2,5,41
3,3,1.,2
31 3.5.5.4
4,I,5,4
No
No
2s0 No
25s No
1,712,416,273 No 1.,432,966,505 No
sNo
184 (oDD-2-PARrrY 2,476,512,156 No
30 zNo
30,207 rNo 269A88,7M No
5,1.,3,31 2,694,881,,M0 oNo 61 No
34 {5,2,3,1,1 4,008,626,1.42 No 6 (ODD-2-PARITY
ARGO ARG1)
36 {2,5,5,2,21 6 @olt-z-pARrry 2,s22,305,s74 No 3,435,923^ffi6 l\o
ARGO ARG1)
32
33
37 {.3,3,5,4,41 195 (EVEN-2-PARrry
ARG1 ARG2)
39 15,1,,4,1,31 1,515,820,810 (oDD-2-pARTry o No 2g,1g0 (oDD-2-pARrry
ARGO ARG2 ) ARGO ARG2)
248 No 96,1.42,779 No
40 15,4,3,4,2}'t,41.4,812,7s6 (ODD_2 _PARTTY
ARG1 ARG3)
570 Chapter 21
No 13,260 No
Is enr3 a
parity rule?
Rule number
for ADF3
ls ADF4 a Rule number
parity rule? for alr'4
10
11
12
13
T4
15
t6
t7
18
19
2T
11.565 No
No
(ODD_2_PARTTY
ARGO ARG2)
(ODD_2_PARITY
ARGO ARG]. )
31 55,M8
109
35 T4 No
11 No
37 27,246 No
t72 No
No No
40 43,516 No No
Evolutionary Selection of the Architecture of the Program
the oval indicate that if this problem is run through to generation 22, processing
a total of E.itho,t = 276,000 individuals (i.e., 4,000 x 23 generations x 3 runs) is
sufficient to yield a solution to this problem with 99% probability.
Table 21.5 summarizes the 25 solutions to the even-4-p arity problem in
the same manner as table 21.4. There is a wide variation among the argument maps of these 25 solutions (with only two argument maps being
repeated). The average number (3.08) of automatically defined functions
for these 25 solutions happens to be the same as for the even-S-parity problem. Of these 25 successful runs,40o/o solve the problem without using a
lower-order parity function and 60% invoke a parity function of order
two. No parity functions of order higher than two are invoked in solving
this parity problem of order four.
21.5 RESULTS FOR THE EVEN.3-PARITY PROBLEM
Finally, we made 40 runs of the even-3-parify problem, all of which produced
a 1.00%-correct solution (scoring 8 hits).
All of the 40 solutions employed one or more automatically defined functions. An examination of the 210 best-of-generation individuals among these
40 successful runs showed that 195 of these 210 intermediate best-of-generation progranu employed automatically defined functions.
The average strucfural complexitf, Switn, of the solutions of the even3-panty problem over these 40 successful runs (out of 40 runs) is 125.1points
with the evolution of architecture.
Figure 2I.28 presents the performance curves based on these 40 runs of the
even-3-parity problem with the evolutionary selection of architecture. The
cumulative probability of success, P(M, i), is 100% by generation 10. The two
numbers in the oval indicate that if this problem is run through to generation
L0, processing a total of Ewith = M,000 individuals (i.e.,4,000 x 1,1 generations
x 1 run) is sufficient to yield a solution to this problem with 99% probabittty.
Table 21.6 summarizes the 40 solutions to the even-3-parity problem in the
same manner as tables 21.4 and 21.6. There is a wide variation among the
argument maps of these 40 solutions (with only two argument maps appearing more than once). The average number (3.08) of automatically defined
functions for these 40 solutions againhappens tobe the same as for the evenS-parity and the even-4parity problem. Of these 40 successftrl runs, 40% solve
the problem without using a lower-order parity function artd 60% invoke a
lower-order (i.e., order two)parity function.
2I.5 SUMMARY
[r this chapter we showed that it is possible to use the competitive fihressdriven evolutionary process to determine the architecfure of the overall program to be evolved while solving the problem. This simultaneous evolution
Chapter 21
takes more computer resources than the solution of the problem where the
architecture is prespecified for the Boolean even-S-parity problem.
Thus, this chapter and additional examples in subsequent chapters provide evidence to support main point 8:
Main point 8: Genetic programming is capable of simultaneously solving
a problem and evolving the architecture of the overall Proglam.
Evolutionary Selection of the Architecture of the Program
22 Evolution of Primitives and Sufficienry
As previously mentioned, the second preparatory step in applying genetic
programming to a problem is to determine the set of primitive functions
of which the yet-to-be-evolved programs are comPosed. For example, the
set of primitive functions used throughout this book for the Boolean evenS-parity problem has consisted of AND, OR, NAND, and NOR. As mentioned
earlier, we chose this particular set of four primitive functions because we
knew that it satisfies both the sufficiency and closure requirements.
Indeed, NAND alone, NoR alone, the set {amn, Nor}, and the set {oR, Nor}
all satisfy both the sufficiency and closure requirements for any problem
of Boolean symbolic regression.
Suppose that we did not know what set of primitive functions is sufficient
to solve a problem of symbolic regression for the Boolean even-S-parity function o{, for some reason, did notwantto make the decisionof determining the
set of primitive functions for this problem.
Orre approachwouldbe to choose a set of primitive functions from alarge,
presumably sufficient superset. Experiments in Genetic Progr ammlng (section
24.3) showed that genetic programming is generally capable of selecting a
useful subset of primitive functions from a superset replete with extraneous
functions.
But suppose we wanted to evolve a set of primitive functions, rather than
merely home in on a subset of primitive functions within a prespecified
superset. We have already seen how genetic programming is capable of evolving automatically defined functions for various problems. The question arises
as to whether it is possible for genetic programming to evolve a set of primitive functions during a run at the same time that it is solving the problem and
evolving the architecture of the overall program. Presumably, primitive functions can be evolved in the same manner as automatically defined functions
are evolved. In other words, primitive functions can be viewed as automatically defined functions composed of very primitive ingredients. Of course/
before embarking on an attempt to evolve a set of primitive functions, one
mustbe clear that every representation ultimately comes down to some kind
of primitive. Thus, when we talk here about evolving a set of primitive functions, we are necessarily talking about a class of problems for which the usual
primitive functions have a yet more elementary representation.
[r this chapter we demonstrate that genetic programming can, in fact, evolve
a sufficient set of primitive functions (in the sense described above) at the
same time as it solves the problem and evolves the architecture of the overall
program.
The idea is to start with at least one primitive function (called a pF) in an
overall Program. The pps would then be used to define the aops (if any are
present in the particular overall program).Thery both the pFs and the ADFg
(if present) are typically used in the result-producing branch.
Irr other words, the function set for the result-producing branctr, frpb,wil.
consist of a yet-to-be-determined number of yet-to-be-evolved primitive functions (each taking an as-yet-to-be-chosen number of arguments) along with a
yet-to-be-determinednumberof yet-to-be-evolvedautomaticallydefinedfunctions (each taking an asTet-to-be-chosen number of arguments). The function set for each function-defining branch, if any, will consist of a
yet-to-be-determined number of yet-to-be-evolved primitive functions (each
taking an as-yet-to-be-chosen number of arguments) along with whatever
other automatically defined functions each function-defining branch is
entitled to reference hierarchically.
The function set for the result-producing branckr, frpb,rs
frpb= {ADFO,...} u {er0,...}.
where the PFs are primitive functions (described in detail in section 22.1) that
will be evolved during the run of genetic progamming. There is at least one
PF in frpb,but there need not be any ADFs tn frpb.
The function set, fadfl,for the first function-defining brandr (defining ADF O ),
if anrO is indeed present in a particular program, is
f"dfl= {PF0,...}.
There is at least one pF tn fadfl.
Given that the second funciion-definingbranch (definingADFl) is entitled
to refer hierarchically to ADF0, the function set for the second function-defining branch, if anpl is present, is
fadft = {ADF0} u {PFO, ...}.
The function sets for any subsequent function-defining branches, if present
in a particular program, are progressively defined in a similar way.
Apopulation size of 4,000 is used throughout this chapter. The techniques
of structure-preserving crossover with point typing (chapter 2I) arc used
throughout this chapter.
Of course, in order to evolve a primitive function, it must be represented in
some more elementary way that permits us to define its behavior.
22.1, PRIMITIVE DEFINING BRANCHES
Boolean primitive functions (e.g., AND, oR, NAND, NoR) admit of an elementary representation (the truth table) that gives us access to the definition of the
576 Chapter22
function. A primitive-defining branch for defining a PF may be viewed as a
truth table whose entries come from the set of the constants, frBoolean, of the
Boolean domain (i.e., T and url).
For example, suppose, as a consequence of some evolutionary Process,
primitive function PFO came to have two arguments and came to be equivalent to the Boolean function we usually call uaNn, then the primitive-defining
branch for pr'0 would be the truth table with four rows shown in table 22.1.
NAND is two-argument Boolean rule 7 and is one of the L6 possible twoargurnent Boolean functions.
br addition, suppose, as a consequence of some evolutionary process, that
a second primitive functiory PF1, is evolved within the same overall program. pFl- might be the one-argument Boolean function that we usually call
NoT. In that event, the primitive-definingbranch for PF1 would be the truth
table with two rows shown in table 22.2. Not is one of four possible oneargument Boolean functions and is one-argument Boolean rule 1.
Similarly, suppose that yet another primitive functioru PF2, is evolved within
the same overall program. Tlable 22.3 has eight rows and defines a threeargument function pr'2 that is equivalent to (IF ARGO ARG1 ARG2 ) .The
rF ftrnction is three-argument Boolean rule 216 and is one of 256 possible
three*argument Boolean functions.
When the initial random population is created, each overall program contains a randomly chosen number of primitive-defining branches (where this
randomnumber is greater than or equal to one and less than or equal to some
maximum number of primitive-defining branches). Each primitive-defining
branch possesses a randomly chosen number of arguments; this random
choice being made independently for each primitive-defining branch (in the
same way as these choices were made for the number of arguments for each
ADF in drapter 2L). Then, for each primitive-defining branch, a random constant (either T or Nr L) is randomly chosen and associated with each of the 2k
possible combinations of the now-known number of arguments, k, of that
branch. These random choices are made independently for each of the 2k
possible combinations of arguments. Ir other words, we construct the last
column (the ouput) of the truth table by randomly inserting the constants T
OT NIL.
During the run, structure-preserving crossover with point typirg will be
performed on the various branches of the overall program. \A/hen neither
crossover point is in a primitive-defining branch, the structure-preserving
crossover proceeds in the same manner as in section 2L2. Howevet from
time totime, one orbothcrossoverpoints will fallwithin aprimitive-defining
branch.
If both crossover points are in primitive-defining branches, then only the
terminals in the tmth table are non-invariant points. hr this case, the crossover operation simply inserts the constant (r or NIL) residing at the crossoverpointinthe truthtable of the contributingparentinto the receivingparent
(at the point of insertion in its tmth table).
3// Evolution of Primitives and Sufficiencv
Thble 22.L Truth table representing the primitive-defining branch for two-argument
PFO equivalent to NAND.
ARGl ARGO PFO
0
1
2
3
NIL
NIL
T
T
NIL
T
N]L
T NIL
Table22.2 Tmth table representing the primitive-defining branch for one-argument
PF1 equivalent to NOT.
ARGO PF1
NIL T
T NIL
T
T
T
0
1
Table 223 Truth table representing
argument PF2 equivalent to (IF ARGO
ARG2 ARG1 ARGO PF2
the primitive-defining branch for threeARGI ARG2).
0
1.
2
a
J
4
5
6
7
NIL
NTL
N]L
NIL
T
T
T
T
NIL
NTL
T
T
NTL
NIL
T
T
NIL NIL
T NTL
NIL NIL
T T
N]L T
T NIL
NIL T
If the crossover point in the contributing parent is in a primitive-defining
branch but the crossover point in the receiving parent is not in a primitivedefining branch, then we must consider whether the random constants,
9tBoolean, were included in the terminal set of the non-primitive-defining
branch involved. For the remainder of this book, the random constants,
9tBoolean, will always be included in the terminal sets of all branches. Given
that this is the case, point typing then permits a random constant from a primitive-defining branch to be inserted into any branch. Conversely, point typing
also permits a random constant (but nothing else) from a non-primitivedefining branch to be inserted as an entry in the tmth table of a primitivedefining branch.
If the random constants, 9tBooleanT wer€ not included in the terminal
sets of the non-primitive-defining branches, then the choice of a crossover
point from the contributing parent fromwithin a primitive-definingbranch
578 Chapter 22
would mandate that the crossover point of the receiving parent be a point
lyi.g in one of its primitive-defining branches.
Primitive-defining branches are implemented using a constrained syntactic structure. The following LISP code precisely defines the operation of the
two-argument PFO for NAND shown rn tableZZ.L:
1
2
3
4
E
b
7
B
(defun PFO (argO argl-)
(values (IF argl (IF argrO NrL
( TF arg0 T
r)
Lr this definition for pFO, nested rFs are used to implement the truth table.
Line2,for example, deals with the case where both ARGO and aRcl are T and
says that PFO retums NIL for that case.
Figure 22.1 shows the above def un for pr'O as a rooted, point-labeled tree
with ordered branches. The points above the upper dotted line (i.e., the de f un;
the function name, PFO; the argument list, (argO arql- ); and the values)
are the usual invariant points of the constrained syntactic structure conunon
to all function definitions in this book. The six points between the two dotted
lines (i.e., the three IFs, the two argOs, and the one argl) constifute the
invariant points of the constrained syntactic structure for implementitg u
two-argument tmth table. The only variable points in this figure are the four
points below the lower dotted line. These four points are the entries of the
truth table and correspond to the four Boolean constants (i.e., NIL, T, T, and
r) that appear at the far right of lines 2,3,4, and 5 of the program above.
The above LISP code and fi gure precisely specify the operation of this primitive-defining branch. For presentation purposes, the tmth table is abbreviated with a function called TRUTH-TABLE as follows:
(TRUTH-TABLETTTNIL).
This expression is then interpreted as if all of the structure described above
were present.
Since the primitive-defining branches are created at random at the initial
random generation, there is no guarantee that the particular set of primitive
functionsbelonging to an overallprogramwillbe sufficient to solve the problem. The potential insufficiency is automatically dealt with in two ways. First,
some individual overall programs in the initial random population may contain a sufficient set of primitive functions. Th"y may consequently be more fit
and enjoy a differential advantage in the competitive evolutionary process.
Second, the primitive-defining branches are subject to crossover and are therefore subject to modification during the run.
Figure 22.2shows an illustrative five-branch overall program for the evenS-parity problem. The firstbranch is a primitive-definingbranch PFO for the
one-argurnent NoT function; the second branch is a primitive-defining branch
Evolution of Primitives and Sufficiencv
(argp argl)
Figure 22.1 Primitive-definingbranch pFO for implementing uaxo.
PF1 for the two-argument oR function; and the third branch is a primitive-defining branch prZ for the two-argument AND function. The fourth
branch is a function-defining branch ADFO for the two-argument odd2-parity function. Finally, the fifth branch is a result producing branch
that invokes apr'0 four times and prO once in order to create the evenS-parity function. The points above the upper dotted line (i.e., the def un;
the functionnames, pF0, pFl-, pF2,and aop0; the argument lists, (ARG0 ) ,
(ARG0 ARGI- ), (aRCO ARG1), and (ARGO ARG1); and the five values
functions) are the usual invariant points of the constrained syntactic structure common to all function-defining branches and all result-producing
branches in this book. The points between the two dotted lines (i.e., the
IFs, the ARG0s, and the ARGIs) constitute the invariant points of the constrained syntactic structure for implementing truth tables. The only
noninvariant points associated with pF0, pF1, and pp2 are the ten points
below the lower dotted line. The body of anrO and the result-producing
branch also appear below the lower dotted line for those branches (indicating that they too are noninvariant points).
RESUTTS FOR THE EVEN.s-PARITY PROBLEM
All of the 14 runs of the even-S-parity problem using the evolutionary method
of determiti^g a sufficient set of primitive functions produced a 100%-
conect solution (scoring 32 hits) by generation 50.
Table ?2.4 shows the wide variation among these L4 solutions in both the
number of primitive-defining brandres, the number of arguments each pF
possesses, the number of function-defining branches, and the number of
arguments each automati cally defined function possesses. The average number of pps is 3.00 and the average number of automatically defined functions
is L.43 for the 14 solutions.
580 Chapter 22
Figre22.2 Illustrative five-branch overall program with three primitive-defining branches,
one function-defining branch, and one result-producing branch.
Table 22.4 Distribution of architectures of the PFs and ADFs of 14 solutions to the
even-s-parity problem with evolution of primitives and sufficiency.
Evolution of Primitives and Sufficiency
Run Generation
when solved
Number
of PFs
Argument
map for
PFs
Number
of ADFs
Argument
map for
ADFs
1
2
3
4
5
6
7
8
9
10
11
12
L3
1,4
4
6
a
J
1
2
4
L
7
10
7
4
6
1
7
J
4
J
4
1
3
1
4
2
4
3
4
2
4
12,3,11
{2,1,,3,3}
{3,2,2}
{1,,1,3,U
{2}
13,"1,31
{3}
{3,3,1.,21
12,31
11.,3,3,21
{3,3,Ll
{1.,3,3,21
{2,21
l.3,1.,3,3\
1,
0
2
0
4
1
3
0
0
J
J
0
0
J
{1}
{}
{.3,41
{}
{3,2,4,3}
{3}
[4,2,41
tl
{}
{.1.,2,"i.1
12,4,31
t)
t)
14,'J',21
With Defined Functions
-
o
ct)
a
q)
I
lr '4.
E
q)
-
*.
a
-
A
)
-
.!l
-
a.l
trl
s
a
(n()
I
I
!a
-
0
CH
h€a !
-
A
-
-
L
A .
II
(o'ooo
50,lNVo)
150,000
Generation
Figure 22.3 Performance curves for the even-S-parity problem showing fhat E*iry = M,000
with evolution of primitives and sufficiency
The average structural complexity, Swith rof the l00%-correct program from
the L4 successful nrns (out of 14 runs) of the even-S-parity problem is 156.8
points with evolution of primitives and sufficiency.
Figure 22.3 presents the performance flrrves based on these 14 runs of the
even-S-parity function with evolution of primitives and sfficienry. The cumulative probability of success, P(M,i ), is 1"00% by generation 10. The two nurnbers in the oval indicate that if this problem is run through to generation 10,
processing a total of E*u^ - M,000 (i.e.,4,000 x11 generations x 1 run) individuals is sufficient to yield a solution to this problem with 99% probability.
An examination of one of the l-4 successful runs illustrates several points
about the evolutionary method of determining a sufficient set of primitive
functions.97% of the 4,000 programs from generation 0 of run 1 from table
22.4 score L6 hits (out of 32). The best of generation 0 of this run scores 24 hits
and consists of three primitive-defining branches and one function-defining
branch. The argument map for the three pps is 12,3,U and the argument map
for the one ADF is {U.
(progn (defun PFO (ARG0 ARG1)
(t.ruth-table T T T NIL) )
(defun PFI (ARG0 ARG1 ARG2)
(truth-table NIL T T NIL T NIL NIL T) )
(defun PF2 (ARGO )
(truth-table NIL T) )
(defun ADF0 (ARGO )
(values (PF2 (PF2 (PF]- (PFl- T ARGO ARGO) (PF2 T) (PF2
NrL))))))
(values (PFl- (PF1 (PFO (ADFO D3) (PF2 D3)) (PF0 (PF2 Dl-)
(ADFO D3)) (pF1 (pF1 D1 Dl D0) (pFO D1 D3) (pFO D2 D3)))
(pF1 (pF1 (pF1 NrL D0 Dl) (pFl D0 T Dl) (pFO D3 D0)) (pFO
I0,I}OVo)
M = 4,000
z=99Vo
R(z) = 1
N= 1 4 10 E = 44.000
(l,ZIVo)
582 Chapter 22
(ADFONIL)(ADFOD1))(PF]-(ADFODl)(PF1D1D2D2)(ppr
T D2 D4))) (PFO (PF2 (PFl- D4 D4 D3)) (ADFO (PFO D 2
r) ))))).
h this program the two-argument PF O is the NAND function (rule 7) and
corresponds to the truth table shown in table 22.1..The three-argument PFI- is
the odd-3-parity function (rule 150). The one-argument PF2 is the identity
function (rule 2).
Although ADFO contains 11 points and seems to be doing some work, it
proves to be the "Always False" function (one-argument Boolean rule 0).
In other words, ADFO has recreated the already-available Boolean constant NIL.
The result-producing branch invokes the "Always False" ADFO six times,
pFO eight times, pFI lztimes, and the identity function Pl'2 three times.
The best of generation 1 scores 26 hits. Although there are three primitivedefining branches in this individual, the argument map for the primitivedefining branches is {L, 3, 3} as opposed to 12,3, Ll for the best of generation 0.
Like the best of generation 0, the best of generation L has one function-defining branch, but the argument map for its function-defining branch is {3},
instead of {U.
(progn (defun PFO (ARG0)
(truth-table NIL NIL) )
(defun PFl (ARGO ARG1 ARG2)
(t.ruth-table NIL NIL NIL T T T T NIL) )
(defun PF2 (ARGO ARGI ARG2)
(truth-table T NIL NTL NIL NIL NIL NIL T) )
(defun ADFO (ARG0 ARG1 ARG2 )
(values (PF2 (PFl- (PF2 ARG2 ARG2 ARG2) (pFt- ARG2 ARG0
ARGO) (ppo ARCO) ) (PFl- (PF0 ARGO) (PFl ARG2 NrL ARGO)
(pF]- ARGO ARG2 ARG2)) (pFO (pF0 ARGI)))))
(values (PF2 (ADFO (ADF0 D1 D3 D3) (PFO D0) (ADFO D4 D4 T) )
(ADF0 (pF0 NrL) (ADFO NrL D0 D2) (ADFO D2 D1 D0)) (PF2
(PF2 D3 D2 DO) (ADFO T D2 DO) (PFO T) )))).
In this program PFO is one-argument Boolean rule 0 (i.e., it has recreated
the already-available Boolean constant NrL). Here PF1 is the three-argument
Boolean rule 120. Three-argument PF2 is rule 129. This program scores better
than the best of generation 0 even though it does not have the seemitgly
valuable odd-3-parity function.
ADF0 is three-argument Boolean rule 165 which is equivalent to
(EVEN-2-PARITY ARGO ARG2) .
The best of generation 2 scores 28 hits. The three PFs and one ADF are identical to the best of generation L; however, the result-producing branch has
changed to
(values (PF2 (ADFO (ADFO D1 D3 D3) (PF0 D0) (ADFO D4 D4 T) )
(ADFO (PFO NrL) (ADFO NrL D0 D2) (ADFO D2 Dl D0))
(pF2 (pF2 D3 D2 D0) (pFO NrL) (pFO r) ))).
Evolution of Primitives and Sufficiency
This 28-scoring program remains the best for generation 3.
The problem is solved on generation 4 by a program scoring 32 hits. Com,
pared to the best of the previous generatiory this program has a different
argurnent map for its primitive-defining branches, {2,2,1}, and a different
argument map for its function-defining branches, {1}. The three pFs and one
anF in this solution are the same as the best of generation 0. This program's
lO0%-correct perforrnance derives from differences in the result-producing
branch from that seen in generation 0:
(values (pF1 (pF1 D1 D0 D1)
(pFl (pFl (pF1 NrL D0 D1) (pF1 D0 T D1) (pFO r D3))
(PF0 (ADFO NrL) (ADF0 D1))
(pF]_ (ADFO D1) (pF1 D1 D2 D2) (pFr T D2 D4)) )
(pFO (pF2 (pF1 D4 D4 D3)) (ADFO (pFO D2 r) )))).
If we substifute NAND for the four occurrences of ppo, oDD-3 -paRrry for
the L0 occurrences of eF1, the constant NrL for the four occurrences of ADFS,
and delete the one occurrence of pr'2 (the one-argument identity function),
this resultproducing branch becomes
(values (ODD-3-PARITY
(oDD-3-PARTTY D]_ D0 D1)
(ODD_3_PARTTY (ODD-3_PARTTY (ODD_3-PARTTY NIL DO D1)
(ODD_3_PARITY DO T D1)
(NAND r n3))
(NAND NIL NIL)
(ODD*3_PARITY NIL
(oDD-3-PARTTY DL D2 D2)
(oDD-3-PARTTY T D2 D4) ) )
D4 D3) NIL) ) ) .
following
( EVEN_2 _PARITY
(oDD-3-pARrTy (ODD-3-pARrTy D0 Dt_ )
(EVEN-2-PARITY DO D1)
(NOr D3 ) )
(ODD-2-PARITY (ODD_3-PARITY DT D2 D2)
(EVEN-2-PARrrY D2 D4) ) ) ) ),
which is a composition of oon-3-pARrry, oDD-2-pARrry and EVEN2 -PARrTv functions that correctly mimics the behavior of the even-S-parity
function.
Tlable 22.5 shows,by generation, the characteristics of the best-of-generation programs of run L from table 22.4.
Figure 22.4 depicts two three-dimensional trajectories showing,by generation, the number of arguments of the primitive-defining branches and
the number of arguments of the function-defining branches in the best-ofgeneration programs of this run of the even-S-parity problem. The three
(NAND (ODD_3_PARITY D4
This can be further simplified to the
(values (EVEN-2 -PARITY
DO
584 Chapter 22
Table 22.5 Characteristics of the best of generation program for generations 0 through 4 of run L of the
even-S-parity problem with evolution of primitives and sufficiency.
Gener- Hits Number Argument Rule
ation of PFs maP of PFs number
for PFO
Rule Rule Number Argument Rule
number number of ADFs maP of numfor Pr1 for PP2 ADFs ber of
ADFO
Rule
number of
ADFl
Rule Rule
nt[n- number of br of
ADF2 ADF3
7
0
0
0
7
0
1,
2
d
J
4
243
263
283
283
323
{2,3,11
11,,3,31
11,3,31
{1,3,3}
{2,3,U
150
120
t20
120
150
000
129
129
129
000
t1l
t3l
{3}
{3}
{1}
000
165
165
165
000
1
1
1
1
1
axes are labeled "F}i' "F1:' and"Fz" and these labels refer to the number
of arguments of PFO, PF1, and PF2 in connection with the primitivedefining branches. The first trajectory (shown with a broken line) traces,
by generation, the number of arguments of erO, PFl- (if Present), and pp2
(if present). Since the number of primitive-defining branches does not
exceed three for this particular run, it is possible to visualize this trajectory using a three-dimensional graph. For generation 0, this first trajectory starts at the point {2, 3, 1} in this three-dimensional space. For
generationl-3, the trajectory goes to the point {L,3,3}. For generation 4,
the trajectory returns to the point 12,3, Ll.
The second trajectory (shown with a solid line in figure 22.4) traces, by
generation, the number of arguments of ADFO, ADF 1 (if present), and ADF2
(if present). We can similarly visualize this trajectory provided the number of function-defining branches does not exceed three (as is the case for
this run). The three labels F0, Fl-, and F2 refer to ADF0, ADF1, and anr2 in
connection with the solid line representing the function-defining branches.
The values of F1 and F2 are zero for generations 0-4 of this run. This second trajectory starts with a value for ADFO of L for generation 0. The trajectory goes to a value for ADF0 of 3 for generations 1-3, and returns to a
value for aopO of 1 for generation 4.
Figure 22.5 depicts two three-dimensional trajectories, by generation,
showing the raw fitness (hits) and the number of primitive-defining
branches and function-defining branches in the best-of-generation programs of this run. The axis labeled "branches" refers to both the number
of primitive-defining branches and the number of function-defining
branches. The first trajectory (shown with a broken line) traces, by generation, the number of primitive-defining branches. As it happens, the
number of primitive-defining branches is three for generations 0-4 of this
particular run. The second trajectory (shown with a solid line) traces, by
generation, the number of function-defining branches which is 1 for generations 0 through 4 of this run.
Figure 22.6 is a three-dimensional histogram for run L of the evenS-parity problem with evolution of primitives and sufficiency. This histogram
585 Evolution of Primitives and Sufficiency
himitive-defining branches,j t t t t t I
Function-defining branches Cn{'
Figure 22.4 Argument hajectory of the number of arguments of the primitive.defining branches
and the number of arguments of the function-defining branches between generations 0 and 4
for the best-of-generation programs of mn 1 of the even-S-parity problem with evolution of
primitives and sufficienry.
Primitive-defining brancllss 17 t t t t /1
Function-defining branches *mrf
Generation
Figure 22.5 Fitness-branch trajectory showing the number of hits and the number of primitive-defining branches and the number of function-defining branches between generations 0
and 4 for the best-of-generation programs of run 1 of the even-S-parity problem with evolution
of primitives and sufficiency.
Hits
586 Chapter 22
>>c')
s
-
q)
E
t
-
q)
L
f-.
tt
2,500
2,000
1,500
1,000
4
J
, ADF,
500
1
2
3
Generation +
Figure 22.6 Branch histogram between generations 0 and 4 of the number of programs in the
population with various numbers of primitive-defining branches and function-defining branches
for run 1 of the even-S-parity problem with evolution of primitives and sufficiency.
shows, by generation, the number of programs in the population of 4,000
with a specified number of primitive-defining branches (from 1 to 3) and a
specified number of function-defining branches (from 0 to 4). As canbe seen,
there is approximate equality at generation 0 in the number of programs with
one, two, and three primitive-definingbranches and with zero, one, two, three,
and four function-defining branches. Howeve4 by generation 4, the most common configuration has three primitive-defining branches and one functiondefining branch.
Thble 22.6 shows, in its first two columns, the run number and the argument map of the primitive-defining branches for the 14 solutions to the evenS-parity problem with evolution of primitives and sufficiency. Each of the
next four pairs of columns relates to each of the four possible pFs that appear
in that solution. The Boolean rule number appears in the first column of each
such pair. If the rule is a parity rule of any kind, the second column of each
pair identifies the parity function.
Table 22.7 shows, in its first two columns, the run number and the argument map of the function-defining branches for the L4 solutions to the
even-S-parity problem with evolution of primitives and sufficiency. Each
of the next four pairs of columns relates to each of the four possible anrs
that appear in that solution in the same manner as the previous table.
4
587 Evolution of Primitives and Sufficiency
Thble 22.6 Characteristics of the PFs of 14 solutions to the even-S -pantyproblem with
evolution of primitives and sufficiency.
Argument Rule number
map for for pFO
IS PFO a
parity rule?
Rulenumber
for pr'1
Is pp1 a
parity rule?
12,3, U
[2,1,,3,3J ( EVEN-2 -
PARTTY
ARGO ARG1)
No
{3,2,2} 124
|t1,1.,3,11 No No
{21 ( oDD-2 -
PARITY
ARGO ARG1)
{3,1.,31 150 ( oDD-3 -
PARTTY ARGO
ARG1 ARG2)
No
{3} 9A (oDD-2-
PARTTY
ARGO ARG2)
13,3,I,2) 21,4 No 90 (oDD-2 -
PARITY
ARGO ARG2)
12,3\ 729
10 {1,3,3,21 No
I [3,3,1] 36 No 150 ( oDD-3 -
PARITY ARGO
ARG1 ARG2)
72 11.,3,3,21 1,62
13 {2,21 No (EVEN-2 -
PARITY
ARGO ARG1)
14 13,1.,3,3)176 No No
588 Chapter 22
Rule number
for pp2
Is PF2 a
parity rule?
Rule number Is PF3 a
for pF3 pairty rule?
No
177 No
/E\/F\T-?-
\!v!f\
PARITY
ARGO ARG1)
L05 (EVEN-2 -
PARITY ARGO
ARG1 ARG2)
108 No
No 10 No
10 154 No 10 No
11
12 05 ( EVEN- 3 - 15 No
PARfTY ARGO
ARG1 ARG2)
13
t4 94 No 156 No
589 Evolution of Primitives and Sufficiency
Table 22.7 Characteristics of the ADFs of 14 solutions to the even-S-parity problem with
evolution of primitives and sufficiency.
Run Argument Rule number
map for ADFs for ADFO
Is alr'O a
parity rule?
Rule number
for aopl
Is ADF1 a
parity rule?
t1] No
{}
13,41 770 No 38505 (EVEN-4-
PARITY ARGO
ARG1 ARG2
ARG3))
tl
13,2,4,31 195 (EVEN-2-
PARITY ARG1
ARG2))
No
{3} 123 No
[4,2,41 118 No (EVEN-2-
PARITY ARGO
ARG1)
{}
{}
10 .L,2,\l No 10 No
11 [2,4,3) 10 No No
12 tl
13 {}
1.4 {4,1.,21 13,107 No No
590 Chapter 22
Run Rule number
for ADF2
Is aop2 a
parity rule?
Rule number Is ADF3 a
for aDF3 pairty rule?
39270 (oDD-3 -
PARITY ARGO
ARG1 ARG3)
165 (EVEN-2-
PARITY ARGO
ARG2) )
54,191 No
10
11 No
13
T4 No
Evolution of Primitives and Sufficiency
592
22.3.3 Results for the Boolean 6-Multiplexer problem
The inputs to the Boolean N-multiplexer hsnctron consis tof k address bits a; and
2k data bits di,where N = k + 2k . That is,
ak-1, ..., el, a6, d2r4, .,., dI, d0.
Its output is the Boolean value of the particular data bit that is singled out by
the ft address bits of the multiplexer. For example, if the fwo address bits, a1
and ag, of a Boolean 6-multiplexer (where k = 2) are 1 and 0, respectively, the
multiplexer singles out data bit d2 (out of the 4) to be the output of the multiplexer because 102 - 2. For example, for an input of 100100, the output of the
multiplexer is 1; for an input of 10L0L1, the output of the multiplexer is 0.
Genetic Programming cary of course, solve the problem of symbolic regression of the 6-multiplexer funcfion using a computationally complete set
of primitive functions such as
f1 = {AND, OR, NOT}.
Howeveq, the solution to a higher-order multiplexer problem is facilitated
(Genetic Programming, subsection 24.3.I) when the function rF is added to ft
so that the set of primitive functions becomes
fZ= { IF, AND, OR, NOT}.
The reason for the improved performance by generation with f2versus f1
is that a higher-order Boolean multiplexer function can very naturally be
decomposed into lower-order multiplexers. The three-argument rp function
is the lowest-order multiplexer.
For example, the 6-multiplexer problem can be decomposed into three
instances of the 3-multiplexer subproblem. The 6-multiplexer is equivalent to
(3-MULTTPLEXER A1 (3-MJLTIPLEXER AO D3 D2)
(3-MULTTPLEXER A0 D1 D0) ),
01, as it would more commonly be writtery
(rF A1_ (rF A0 D3 D2 ) (rF A0 D1 D0 ) ) .
Thus, the rF function is especially useful in solving a higher-order multiplexer problem.
Figure 22.7 shows this decomposition.
Suppose that we do not know what set of primitive functions is sufficient
(or helpful) in solving the problem of symbolic regression of the Boolean
multiplexer oI, for some reason, did not want to make the decision of choosing the set of primitive functions for this problem. Genetic programming cdn,
in fact, evolve a sufficient set of primitive functions for the multiplexer problem at the same time as it solves the problem and evolves the architecfure of
the overall program.
[a one run of the 6-multiplexer problem, the best of generation 0 scores 46
(out of 64) hits and has an argunent map of {2,3} for its prirnitive-defining
Chapter 22
Figwe22.7 Decomposition of the 6-multiplexer function into three calls to the 3-multiplexer
(IF) function.
branches and theempty argumentmap of {} for its function-definingbranches
(i.e., it has no anrs).
(progn (defun PFO (ARG0 ARGI)
(truth-table T NIL NIL NIL) )
(defun PF1 (ARGO ARG1 ARG2)
(truth-table T NIL NIL NIL T T NIL NIL) )
(values (PFO (PF1 D5 D1 D0) (PFO D4 D2) ))) .
In this program PFO is the two-argument NOR function and PF1 is threeargument Boolean r:vLe 49. NoR alone is, of course, computationally complete.
The following lOO%-correct program with an argument map of {1, 1,3} for
its primitive-defining branches and an argument map of {} for its functiondefining branches emerged on generation L4 of this run:
(progn (defun PFO (ARGO)
(truth-table T NIL) )
(defun PF1 (ARG0 )
(truth-table NIL T) )
(defun PF2 (ARGO ARG1 ARG2 )
(Lruth-table T T NIL NIL T NIL T NIL) )
(values (PF0 (PF2 (PFO (PF2 D3 Dl D5)) (PFO (PF2 (PFO (PF2
D3 D0 D5)) (pFO (pF2 D1 D2D4\) (pF2 (pF2D4 D0 Ds) D 5
(PF2 D0 D2 Ds)))) D4)))).
Here, PFO is the one-argument NoT function and PF1 is the one-argument
identity function.
Table 22.8 is a truth table for pp'2 which is equivalent to three-argument
rule 83 and (rF ARG2 (Nor ARGo ) (Nor ARGI- ) ) . This doubly negative
rF rule, in conjunction with Nor, enables the result-producing branch to produce the desired behavior of the 6-multiplexer function.
hr a second run, a solution emerged on generation 20 with an argument
map of {3,2} for its primitive-definingbranches and an argument map of {a}
for its function-definingbranches. Since this solution occupies an entire page,
we do not show it here. Howeveq, the important point is that PFO in this
solution is the familiar three-argument rF function (rule 216) shornm in table
22.3.In additioru PF1 is the two-argument NoR function and ADFO is fourargument rute29,490.
593 Evolution of Primitives and Sufficiency
Table 22.8 Truth table representing
argument PF2 equivalent to (rF ARG2
ARG2 ARGl ARGO PF2
the primitive-defining branch for three-
(NOT ARGO ) (NOT ARG1 ) ) .
0
1
2
a
J
4
5
6
7
NIL
NIL
NIL
NIL
T
NIL
NIL
T
T
NIL
NIL
T
T
T
T
T
NIL T
T T
NIL N]L
T NIL
NIL T
T NIL
NTL T
T NTL
Thus, in both runs, some variant of the rF function evolved in one of
the primitive-defining branches of the solution to the 6-multiplexer problem. It is, of course, possible to represent the multiplexer function without
the rr function. However, this convergence apparently occurs because
the rF function is inherently so useful and helpful in constructing the
behavior of a multiplexer.
22.4 RESULTS FOR A SINGLE PRIMITIVE FUNCTION
In the foregoing two sections, we allowed genetic programming to dynamically discover the number of primitive-definingbranches. Suppose, howeve4,
that we are specifically interested in discovering a single primitive function
that is sufficient to solve a problem (assuming this is possible for the problem
domain involved).
Genetic programming may be used to solve the problem of identifying a
single primitive function sufficient to solve a givenproblemby restricting the
number of primitive functions to exactly one and then proceeding in the same
mErlner as the previous section.
The nurnber of arguments possessed by the single primitive-defining branch
is notpredetermined (unless that, too, is a goal of the effort).In any event, the
number of function-defining branches and the number of arguments they
each possess is not predetermined.
22.4.1, Boolean 6-Multiplexer Problem
In one run of the 6-multiplexer problem, the following lO0%-correct solution
employing exactly one PF emerged in generation 11:
(pro$r (defun PFO (ARGO ARG1 ARG2)
(truth-table NIL NIL NIL T T NIL T T) )
(defun ADF0 (ARG0 )
(values (PFO T ARGO ARGO)))
Chapter 22
(defun ADF1 (ARG0 ARG1 ARG2 ARG3)
(values (PF0 T ARGI NIL) ) )
(values (PFO (PF0 T D4 D5) (PF0 (PFO D0 D4 D4) (PF0 D5 D3
Dl-) (PFO D5 D2 D0)) (PFO D5 D2 D0) ))).
In this program pFO is ( IF D0 DL D2 ) . Indeed, as previously discussed, we
would expect some variant of the rp function to emerge as the single buildingblock for constructing the multiplexer function.
22.4.2 Even-S-ParitvProblem
In two runs of the
"rr"r'r-U-Ourity
problem (out of three runs made), a100%-
correct solution employing exactly one PF emerged wherein the lone PF possesses two arguments and is equivalent to EVEN-2 -PARITY (two-argument
rule 9).
hr contrast, in the third run, a coqpulent l00%-correct solution employing
exactly one pF emerged in generation 2. hr this solution, the single PF possesses three arguments and is equivalent to rule L03. Rule L03 is not a parity
rule. The result-producing branch, in conjunction with this program's ADF,
produce the desired behavior of the even-S-parity ftrnction.
595 Evolution of Primitives and Sufficiency
23 Evolutionary Selection of Terminals
The first preparatory step in applyng genetic programming to a problem is
to determine the set of terminals from which the yet-to-be-evolved programs
will be composed.
In order to evolve a computer program capable of producing the desired
output for a given problem, it is necessary to have access to a set of inputs that
are at least a superset of the inputs necessary to solve the problem (that is, the
terminals mustbe sufficient for the problem). For example, if one is trying to
evolve a model for the quarterly price level of an economy, it may be necessary to have access to independent variables such as the quarterly gross national product and the quarterly money supply of the economy (since these
variables suffice to establish the price level). The yields of three-month U.S.
Treasury Bills is not correlated to the price level of the economy. Howeveq,
given the gross national product, the money supply, and the short-term T-bill
yields, genetic progamming is capable of evolving the well-known econometric exchange equation for the price level of the economy by identifying
and using the subset of inputs that are relevant to the problem (Genetic Programming, sections 10.3 and 24.I) while ignoring rainfall.
The question arises as to whether it is possible for genetic programming to
evolve the terminal set of a problem (in the sense of enabling genetic programming to select the inputs of the yet-to-be evolved program from a sufficient superset of available inputs) during a run at the same time that genetic
programming is evolving a sufficient set of primitive functions, evolving the
architecture, and solving the problem.
hr this chapter we demonstrate that genetic programming can, in fact evolve
a terminal set in the above sense. The techniques of chapter 2L involving structure-preserving crossover with point Vpi"S are used to accomplish this.
23:1. PREPARAIORYSTEPS
A terminal set containing the five acfual variables, D0, D1, D2, D3 , and n4, is
sufficient to solve the even-S-parity problem. Lr the absence of one or more of
these five relevant variables, genetic programming cannot compose aI00"/"-
correct computer program to perform the behavior of the even-5-partty
function. However, suppose the terminal set were enlarged and became
the superset
f1 = {NorsE, Do, Dr,D2, D3, D4,9tBoolean},
NOISE is an extraneous noise variable that is not correlated ir ary way to
the five actual variables of the problem (i.e., n0, DL,D2, D3, and D4) or to the
value of the even-S-parity function associated with these five actual variables.
On each instance when the terminal Norsn is encountered in any branch of
any pro$am, NOISE is randomly set to either T or NIL by a call to a randomizer. The random choice for the value of morse is made independently and
anew on each instance when NOISE is encountered.
As before, 9iBoolea' is the random Boolean constant ranging over the
I - I ---- values T and NIL.
Apopulation size of 4,000 and the techniques of structure-preserving crossover with point Vpirg (chapter 21) are used throughout this chapter.
23.2 RESULTS FOR THE EVEN-s-PARITY PROBLEM
The fihress of a program in the even-S-parify problem is the sum, over the 32
fitness cases, of the errors between the value retumed by the program and the
correct value of the even-S-parity function for that fitness case. If NOrsn is
present in a part of a program that contributes to the eventual value retumed
by the program for a fitness case, the effect is usually that the value retumed
is not correlated to the correct value of the target even-5-parify function for
that fifiress case.
Lr one run,85% of the programs in generation 0 had a fitness of 1,6 and an
additional7% of the programs had a fihress of either 15 or 17.
hr generation 0 of one run of the even-S-parity problem with the noise variable, the best of generation 0 scores 23 hits (out of 32) and has an argument
map for its primitive-defining branches of {3} and an argument map for its
function-defining branches of {2,1}. enO is three-argument rule 85; ADFO is
two-argument rule L0; ADFI- is one-argument rule L (NOr); and the resultproducingbranch contains one reference to NOTSE.
Between generations 1 and 7 , the number of hits for the best-of-generation
program is 25, 27, 28, 28, 30,30, and 30 and the number of occurrences of
NOISE in the best-of-generation programs is 5, 3, 9,5,I,3, and 3.
On generation 8, the following L}}oh-correct program emerged with an
argument map for its primitive-defining branches of {3} and an argument
map for its function-definingbranches of {}:
(progn (defun PFO (ARGO ARG1 ARG2)
(truth-t.able NIL NIL T T T NIL NIL NIL) )
(values (PFO (PF0 (PFO NoIsE D3 D0) (PF0 D2 D4 D0)
(pF0 D2 D4 D0)) (pF0 D0 D0 D4) (pFO (pFO D3 D0 D3)
(PFO NrL r D3) (PF0 NrL D2 D1))))).
Even in this 100%-correct program, NorSE appears once (in boldface) in
the result-producing branch.
598 Chapter 23
h this program, pFO is rule 28. When the entire program is rewrittenby
expanding rule 28 and, for convenience, using NOT, EVEN-2 -PARITY, and
ODD- 2 - PARITY, it becomes
(OR (AND (NOT (OR (AND (EVEN_2_PARITY d1 d2) (NOT D3))
(AND (ODD-2-PARIrY dl d2 ) d3 ) ) )
(oDD-2-PARTTY d0 d4) )
(AND (ON (AND (EVEN_2_PARITY d1 d2) (NOT D3))
(ANn (ODD-2-PARrTY dl d2) d3))
(NOT (OR (AND (OR (AI{D (NOT Pgl D4)
(AND DO (NOT (OR D2 D4) ) ) )
(NOT (OR (OR (Ar{D (NOT D0) D3)
(AriID Do
(NOT (OR NOrSE D3) ) ) )
(OR (AI\ID (uor D0) D4)
(AIID D0
(NOr (oRD2D4)))))))
(oDD-2-PARrrY d0 d4) ) ) ) ) .
The entire section in boldface above (containing NOISE) is equivalent to
NrL, so this expression can be simplified to
(oR (AND (NOT (On (AND (EVEN-2-PARITY D1 D2) (NOT D3))
(AND (ODD-2-PARrrY Dr D2) D3)))
(oDD-2-PARrrY D0 D4) )
(AND (OR (AND (EVEN-2_PARITY DL D2) (NOT D3))
(AND (ODD-2-PARrrY Dl D2) D3))
(EVEN-2-PARrrY D0 D4) ) ),
which, in tum, is equivalent to the even-S-Panry function.
599 Evolutionary Selection of Terminals
24 Evolution of Closure
J
Every function in the function sets of all the foregoing problems in this book
has satisifed the closure requirement in that it been able to accept, as its arguments, any value that it may possibly encounter as the refum value of any
function in the function set and any value that it may possibly encounter as
the value of any terminal in the terminal set. This closure requirement has
previously been identified (section 2.3) as being desirable when applying
genetic programming to a problem. All automatically defined functions herein
have also satisfied the closure requirement.
In chapter 22, we saw that it is possible to evolve a set of primitive functions (in terms of a yet more elementary representation). Chapter 23 shows
that it is possible to evolve a set of terminals (from a sufficient superset of
terminals).
The question arises as to whether it is possible for genetic programming to
discover solutions to problems in the absence of a prior guarantee that the
closure requirement is satisfied. That is, can closure be dynamically achieved
during a fl.rn at the same time that genetic programming is simultaneously
evolving a sufficient set of primitive functions, evolving the architecture of
the overall program, ffid solving the problem. This question is answered in
the affirmative in this chapter.
We use the Boolean even-4- and S-parityproblems to illustrate thisprocess.
24.']. UNDEFINEDVALUES
When the primitive-defining branches are created for the initial random generatiory the set of possible entries in the tmth tables is expanded from Nrr,
and t to the following three possibilities:
Irrrrr m . TT\TT]rlLarnrr.nl
t !\ r ur f,
,
. uI\ull II\LU L
where : UNDEFINED denotes an undefined value.
hr defining a primitive functiory it is necessary to specify a retum value for
every combination of argurnents to the PF. When one (or more) arguments to
a PF can be : UNDEFTNED, it becomes necessary to specify the value that the
PF retums in that situation.
Table 24.1 presents the definition in the form of a truth table of the twoargurnent oR function whose arguments can assurne the values NrL, T, or
: UNDEFTNED. since either or both arguments (arg0 or argl) can each
assulne one of three possible values, the truth table has nine rows. The function retums NrL or r for the four rows of the truth table for which both argg
and argl- assume the values NrL or t; howeveq, the function returns
: UNDEFTNED for the five rows for which one or both of its arguments are
: IINDEFINED.
In defining an automatically defined function, it is necessary to specify a
refurn value for every combination of arguments to the ADF. When an argument to an ADF can assume the value : UNDEFTNED, it is necessary to specify
the value that the function should retum when any of its arguments are undefined.
An aoF is a composition of PFs, ADFs, and terminals. The root of an ADF
here is always a PF or an ADF. The reasons for this are that our conventions
require that the rootof eachbranchof aprogram in the initial randomgeneration be a function and that our conventions require that the .rorrorr., ft^gment carurot be a terminal if the root of a branch is chosen as the point of
insertion.
24.2 PREPARATORY STEPS
When the result-producing branch returns : uNDEFTNED, the fitness measure must penalize the undefined value. Whether the penalty is moderate
or severe is a matter of choice. For simpticity in this discussion, we will
choose a severe penalty such that if, during the evaluation of the fitness of
a program, : UNDEFTNED isreturned for any fitness case, the fitness of the
program is worse than the worst possible value of fitness for a program
whose value is defined for every fitness case. For the even-4-parity problem, standardized fltness ordinafily varies between 0 and '1,6, with 16
being the worst. Thus, the fitness of a program that returns : uNDEFTNED
for any fitness case should be at least 17.To provide additional information for analysis purposes, we decided to increment standardized fitness
by 17 for each fitness case for which the program returns : uNDEFTNED.
Thus, a Program with one : uNDEFrNnn fitness case has a standardized
fitness of 17 plus whatever contribution comes from the other 15 fitness
cases; a Program with 16 : UNDEFINED fitness cases has a standardized
fitness of 272. All of these values (between !7 and272) areworse than the
worst possible value (16) of fitness for aprogram whose value is defined,
but wron8, for every fitness case. Since tournament selection is the
default method of selection for both parents in this book (appendix D),
the effect is that a program with as few as one : uNDEFTNED fitness case
never wins a tournament with a program whose standardized fitness varies between 0 and 16. Occasionally, when the operation being performed
is reproduction and when a tournament consists of a group of seven programs each with a standardized fitness of 17 or worse, the program that is
602 Chapter 24
Table 24.1 Nine-row truth table
two-argument OR function whose
: UIVDEFINED.
for the primitive-defining branch pFO for the
arguments may assume the values NIL, T, or
AI\.LT L ARGO PFO
0
1
2
J
4
5
6
7
8
NIL
NIL
NIL
NIL
T
: UNDEFINED
NIL
T
: I.INDEFINED
NIL
T
: IINDEFINED
NIL
T
: UNDEFINED
T
T
:UNDEFTNED
:UNDEFINED
: UNDEFfNED
: UNDEFINED
T
T
T
: UNDEFINED
:UNDEFINED
: UNDEFINED
copied into the next generation will have a standardized fitness of \7 or
worse. Such a program will, of course, have a very low probability of
remaining in the population thereafter. Thus, even if the initial random
poPulation contains numerous primitive-defining branches that do not
satisfy the closure requirement, genetic programming should rather rapidly select in favor primitive-defining branches that do satisfy the closure
requirement.
A population size of 4,000 is used throughout this chapter. The function
sets and terminal sets for the function-defining branches and the resultproducing branches are the same as in chapter 23, except that the random
constants, frtemaV, ranging over the values T, NrL, and :UNDEFTNED are
used. The techniques of structure-preserving crossover with point Vping
(chapter 2I) are used throughout this chapter.
24.3 RESULTS FOR THE EVEN.4.PARITY PROBLEM
As one would expect, many of the primitive-defining branches in the initial random generation do not satisfy the closure requirement. In fact, in
one run, 2l% of the overall programs return : uNDEFTNED for all 16 fitness cases and have a standardized fitness of 272. An additional 31%
return : UNDEFTNED for between one and 15 of the 16 fitness cases. However/ 48% of the programs in the initial random population satisfy the
closure requirement for all 16 fitness cases.
The median program from generation 0 of run L scores 41, has {2,2,1.,r} as
the argument map for its four primitive-definingbranches, and has {4, 2,g,41
as the argument map for its four function-defining branches.
The best of generation 0 of run 1 (shown below) has a standardized
fitness of 4 (i.e., scores L2 out of 16 hits), has an argument map of 12,2\ for
its primitive-defining branches, and has an argument map of {1} for its
function-defining branch.
603 Evolution of Closure
(proqn (defun PFO (ARGO ARG1)
(truth-table T :UNDEFTNED :UNDEFTNED :uNDEFTNED T
NIL NIL NIL NIL) )
(defun PFl (ARG0 ARG1)
(truth-table :UNDEFTNED T NrL T NrL :uNDEFTNED T NTL
:UNDEFTNED) )
(defun ADF0 (ARG0 )
(values (pF0 (pF0
:UNDEFINED) ) ) )
(values (pF0 (pF0 D 0
In this program pFO appears in
ARGO :UNDEFINED) (PFO T
D2) (PFO D3 D1) ) ) ) .
the result-producing branctr, but pFi_ and
ADF0 do not.
Table 24.2 shows the tmth table for PFO from the best of generation 0 of
run 1.
The fact that three of the nine rows of the truth table for pr'O contain
: UNDEFTNED seems highly unfavorable; howeveq, the result-producing branch
of the best of generation 0 contains only the terminals D0, Dr, D2,and p3 and
does not contait *y occurrences of the random constant : LII\DEF INED. Thus,
we need not be concemed about the : UNDEFTNED value appearing in row 2
of the truth table. Lr fact, the two inner PF0s in the result-producing branch
refum either T or : UNDEFTNED, but never NrL. The two inner pFos of the
result-producing branch each return T whenever their two arguments agree
(rows 0 and 4), so the outer PFO retums T (from row 4) whenever DO matches
n2, artd D3 matches D1. Thus, for a few fitness cases, the result-producing
branch has the behavior of the even-4-parity function. The two inner pps of
the result-producing branch each retum : TTNDEFTNED (rows 1 and 3) whenever their two arguments disagree. The outer PF0 can refum NIL (from row
8) if both inner PFs retum : uNDEFTNED or it can retum NrL (from rows 5 or
7) if exactly one inner PF retums : uNDEFTNED. Thus, this particular program
from generation 0 never refums : uNDEFrNnn and its behavior bears some
resemblance to the target even-4-parity function.
Table24.2 Nine-row truth table for two-argument pFO from the best of generation
0 of run 1 of the even-4parity problem.
ARGl ARGO PFO
0
1
2
J
4
5
6
7
8
NIL
NIL
NIL
T
T
T
: IINDEFTNED
:UNDEFINED
:UNDEFINED
NIL
T
: UNDEFTNED
NIL
T
: UNDEFINED
NTL
T
: UNDEFINED
T
:UNDEFINED
: LINDEFINED
:UNDEFINED
T
NIL
NTL
NIL
NIL
Chapter 24
We now consider a second run.
In generation 4, the following 1OO%-correct Program emerges with an
argument map of 12,L\for its two primitive-defining branches and an argument map of {4,2,2,2} for its four function-definingbranches:
1rrrorrn (defrn PFO (ARG0 ARG1)
\}/!vYrr \ev!srf
(truth-tabl-e T NIL NIL NIL T T :UNDEFfNED :UNDEFINED
:UNDEFINED) )
(defun PF1 (ARGO )
(truth-table :UNDEFINED NIL :UNDEFINED) )
(defun ADFO (ARG0 ARG1 ARG2 ARG3 )
(values (PFl (PFO (PF1 ARGO) (PFO ARG1 ARG1)))))
(defun ADF1 (ARGO ARG1)
(values (PFO NIL ARG0)))
(defun ADF2 (ARGO ARG1)
(values (PFO (PFO (PFO ARG0 NIL) (PFO NIL ARG1) )
(pFO (PFO ARG1 ARGO) (PFO ARGO ARGO) ) ) ) )
(defun ADF3 (ARGO ARG1)
(values (ADFL (ADFI- (ADF2 ARGO ARGI-) (PF1 ARGO))
(PF0 (PFO NrL NrL) (PFl ARGO) ) ) ) )
(values (PFO (PF0 D2 D3) (PF0 Dl D0))))-
In this program pFO (butno ADFOs) appear in the result-producingbranch.
Table 24.3 shows the definition of pp'O for the result-producing branch of this
best-of-run program from generation 4.
Here PFO returns : UNDEFINEI only when its first argument is
: UNDEFTNED. Whenboth of its arguments are defined, this new PFO is equivalent to the nVgN- 2 - eARITY function. Since the result-producing branch contains only the four actual variables of the problem (DO, DI,D2, and O3) and
does not contair *y occnrrences of the random constent : UNDEFINED, the
result-producing branch becomes
(EVEN-2-PARITY (EVEN-2-PARITY D2 D3) (EVEN-2-PARITY Dl D0) ),
which, in tum, is equivalent to the even- -parity function.
Table 24.3 Nine-row tmth table for two-argument PFO from the solution from
generation 4 of run l- of the even-4-Parity problem.
ARGI ARGO PFO
0
1
2
J
4
5
6
7
8
NIL
NIL
NIL
T
T
T
:UNDEFINED
:UNDEFTNED
: UNDEFINED
NIL T
T NIL
: TINDEFINED
NIL
T
: III{DEFINED
NIL
T
:UNDEFINED
NIL
NIL
T
T
: IINDEFINED
: LIItrDEFINED
: UNDEFINED
Evolution of Closure
In generation 7 of run 2, the following 1OO%-correct program emerges with
an argument map of I2l for its one primitive-defining branch and an argu,-
ment map of {2,4,21 for its three function-defining branches:
(progn (defun PFO (ARGO ARG1)
(truth-table T NIL NIL :UNDEFINED T NIL T NIL :LINDEFINED) )
(defun ADF0 (ARGO ARG1)
(values (PFO (pFO (pFO (pFO T ARG1) (pFO ARGO T) ) (pFO
(PFO :IINDEFINED ARG1) (pFO ARG1 ARG1) ) ) (pFO (pFO
(PFO :UNDEFTNED ARG1) (ppo ARG1 ARGO) ) (pFO (pFO NrL
ARGO) (pFO ARGO ARGI) ) ) ) ) )
(defun ADF1 (ARGO AR.c1 ARG2 ARG3 )
(values (ADF0 ARG1 T) ) )
(defun ADF2 (ARGO ARG1)
(values (pFO (ADFI (ADF1 (ADF1 ARG1 ARG1 ARG1 ARG1)
(ADFO ARG1 ARG1) (ADFO ARG1 ARG1) (ADFO ARG1 T) )
(PFO (ADF1 ARG1 :UNDEFINED NIL ARGO) (ADFO ARG0
:UNDEFTNED) ) (ADFO (PFO ARG1 ARG1) (pFO T :UNDEFINED) ) (PFO (ADF1 ARGO ARG1 ARGO :UNDEFTNED) (pFO
NIL ARG1))) (ADF0 NrL (ADFI (ADFO T :UNDEFTNED)
(ADFO ARGO ARG1) (pFO ARG1 ARGO) (ADF0 ARG1 r) )))))
(values (ADFO (ADFO (ADFO (ADF2 D1 D3) (PFO NIL D2))
(ADF1 (ADF1 D3 D]- T Dl) (ADF1 T Dt- D]- D2) T (ADFO D2
NIL))) (ADFO (ADF0 (ADF1 Dl :LINDEFINED Dl D0) (ADF2 D 3
D0)) (ADF2 (ADF2 D3 D0) (pFO :UNDEFINED D1)))))).
In this program pFo, ADFo, ADFI-, and anp2, along with the random
constants NrL, T, and : UNDEFTNED all appear in the result-producing
branch. Moreover, : UNDEFTNED appears twice in the definition of pF0, as
shown in table 24.4. Nonetheless, the result-producing branch never
Table 24.4 Tmth table for two-argument PFO from the best-of-run program from
generation 7 of rurr2.
A.rlUI ARGO PFO
0
1
2
J
4
5
6
7
8
NIL
N]L
NIL
T
T
:UNDEFINED
: LTNDEFTNED
NIL T
T NIL
: TINDEFINED NIL
NIL
T
: TII\TDEFINED
T
T :UNDEFTNED NIL
: UIVDEFTNED NIL T
T NIL
:UNDEFINED :UNDEFTNED
606 Chapter 24
returns : UNDEFINED and has the desired behavior of the target even-
-pafity function.
24.4 RESULTS FOR THE EVEN.s-PARITY PROBLEM
This section considers the even-S-parity problem using the evolutionary
method of determiningthe architecture, a sufficient setof primitive functions,
and closure.
The best of generation 0 of one run scores 17 hits (out of 32), has IL, 2l as the
argument map for its two primitive-defining branches, and has {2} as the
argument map for its one function-defining branch.
On generation 27, the following lOO%-correct program scores 32 hits,
has {L, 2l as the argument map for its fwo primitive-defining branches,
and has [3, 4,4] as the argument map for its three function-defining
branches:
(progn (defun PFO (ARGO)
(truth-table T :UNDEFINED T) )
(defun PFI- (ARGO ARGI-)
(truth-table NIL NIL :UNDEFINED T :UNDEFINED NIL NIL T
NIL) )
(defun ADFO (ARGO ARG1 ARG2)
(values (PF1 (PFO NIL) (PFl (PFO ARG2) (PF1 NIL
ARGI)))))
(defun ADF1 (ARG0 ARG1 ARG2 ARG3)
(values (PF1 (PF1 (PF1 ARGO ARG3) (ADF0 ARGO ARGO
ARG2)) (PFO T) )))
(defun ADF2 (ARGO ARG1 ARG2 ARG3)
(values (ADF0 (PF1 (PFO ARG2) (PFO T) ) (PFO (PFO (PFO
r) ) ) (pFl (pFO (pFl_ ARGO ARGO) ) (pF1 (pFO ARG2) (pFl
ARG1:UNDEFINED))))))
(values (ADF1 (PF1 (PF1 (ADFO D2 NIL NIL) (PF1 (ADFO D 2
NrL NrL) (ADF2 D3 D4 D4 D3))) D1) (pF]- (pF1 (ADF2 T D 0
D4 NrL) (ADF2 D3 D4 D4 D3)) (ADF1 D4 NrL D1 D4)) (PFl_
(ADF2 T D0 D3 NIL) (PFl (ADF2 D3 D2 D4 :UNDEFINED)
(PF0 (PFO T)) ) ) (PFl- (ADF0 :UNDEFINED NIL NIL) (ADF2
D3 D4 D4 D3) ) ) ) ) .
Thble 24.5 Three-row tmth table for one-argument PF 0 from the solution from generation 27 of the even-S-parity problem.
Ala\JU PFO
0
1
2
NIL T
T : UIVDEFINED
:UNDEFTNED T
Evolution of Closure
Table 24.6 Nine-row truth table for two-argument pF1 from the solution from
generation 27 of the even-S-parity problem.
Ar1\JI- I TT\JU PFO
0
T
2
3
4
5
6
7
8
NIL
NIL
NIL
T
T
T
: UNDEFTNED
: UNDEFINED
: IIIVDEFrNED
NIL
T
: UNDEFINED
NIL
T
: UNDEFINED
NfL
T
: UNDEFTNED
NIL
NIL
:UNDEFINED
T
: UNDEFINED
N]L
NIL
T
NTL
s
a
(t) ()
I
I
I
-
a
CH
*).-
A
t r
A .
-l
With Defined Functions
800,000
93Vo)
400,000
o (z,Tvo) 25
Generation
Figure 24.1 Performance curves for the even-S-parityproblem showing that E*ir2 =240,000
with evolution of closure.
Table 24.5 is a three-row truth table for the one-argument PFO from the
best of generation27 of the even-S-parity problem.
Table 24.6 is a nine-row truth table for the two-argument pFl from the
best of generationZT of the even-S-parity problem.
This three-argument ADFO has the behavior of (oDD-2 -pARrry ARG1
ARG2 ) for the eight of its 27 combinations of arguments that contain no
occurrences of : UNDEFINED.
Even though the result-producing branch of this I00% correct program
contains two occurrences of the random constant : UNDEFINED, it nonetheless has the behavior of the even-s-parity function.
Figure 24.l presents the performance curves based on the 14 runs of the
even-S-parity problem with the evolutionary method of simultaneously
\
(s0,
r3q)
V) (t)
c)
I
l.r
A -
Fl
c)
-
ia
o
A
-
-
.!l
I
E
f{
608 Chapter 24
determining the architecture, a sufficient set of primitive functions, and
closure. The cumulative probability of success, P(M,i ), is 93%by generation 29 and is still 93% at generation 50. The two numbers in the oval
indicate that if this problem is run through to generation29, processing a
total of E*uo = 240,000 individuals (i.e.,4,000 x 30 generations x 2 runs) is
sufficient to yield a solution to this problem with 99% probability'
Evolution of Closure
Simultaneous Evolution of Architecture,
Primitive Functions, Terminals, Sufficiency,
and Closure
The sixth major preparatory step in applying genetic programming with
automatically defined functions to a problem is to determine the architecture
of the yet-to-be-evolved programs.
The second major preparatory step in applying genetic programming to a
problem is to determine the set of primitive functions from which the programs to be evolved will be composed.
The fust m ajor prepatatory step in applyng genetic programming to a problem is to determine the set of terminals from which the programs to be evolved
will be composed.
The set of primitive functions and the set of terminals should satisfu both
the sufficiency requirement and the closure requirement.
Chapters 21 through 24 demonstrated that genetic programming is capable
of evolving (selecting), in various separate combinations, the solution to a
problem, the architecture, the primitive functions, and the terminals while
satisfying the sufficiency requirement and the closure requirements. hr this
chapter we demonstrate that genetic programming is capable of solving a
problem while simultaneously evolving allfiae of these athibutes together.
25.1. PREPARATORY STEPS
The process proceeds in the same manner as in chapter 24.
If the problem is the even-4-parity problem, then the terminal set is
7= {NorsE, D0, D1_,D2, D3,9t1grnary},
whereNorsEisanextuaneousnoisevariable(describedinsection23.l)andwhere
frt"*uty is the random constant ranging over the values T, Nf L, : IINDEFfNED.
A population size of 4,000 and the techniques of structure-preserving
crossover with point typi.g (chapter 21) areused throughout this chapter.
25.2 Results for Even-4-Parity Problem
The best of generation 0 has a standardrzed fihress of 4 (i.e.,scores 12 out of 16
hits), has an argurnent map of 12,21 for its two primitive-defining branches,
and has no function-defining branches. It is shown below:
(values (PF0 (PFO (PFl (pF0 D1 D1) (pF1 D2 DO)) (pF0 (pFO
NOISE :UNDEFINED) (PF0 D2 Df ))) (pFl (pF1 (pF1 NOISE
NOrSE) (pFO D2 D3)) (pFO (pF1 D0 D3) (pFl D3 D1)))))).
The noise variable, NorSE, appears three times in the result-producing
branch of this program.
A 100% correct solution to the even-4-parity problem emerges on generation 8 with a standardued fitness of 0 (i.e., scores 16 out of L6 hits), €m argument maP of {21for its one primitive-defining branch, and an argument map
of {4,1} for its two function-defining branches:
(progn (defun PFO (ARG0 ARG1)
(truth-table NIL NIL T T
(defun PFl (ARGO ARG1)
( Lruth-table : LII\DEFINED
: UNDEFINED : TINDEFINED
(progn (defun PFO (ARG0 ARG1 )
(truth-table T NIL NfL
(defun ADFO (ARGO ARG1
(values (PFO (PFO ARG1
(defun ADFI (ARGO )
(values (PFO T ARGO)))
(values (PFO (PFO NIL NIL)
NIL NIL NIL T :UNDEFINED) )
T :T]I\DEFINED NIL T :UNDEF]NED
:UNDEFINED) )
NIL T NIL T T :UNDEFINED) )
ARG2 ARG3)
ARGI) (PF0rr) )))
(PFO (PFO D3 D1-) (PFO D2 D0) ) ) ) )
NorsE does not appear in the result-producing branch of this program.
Table 25.1 is the truth table for the primitive-defining branch pF0 for bestof-run program from generation 8 of run L.
According to the tmth table, (PFO NIL NrL) is T. Since the actual variables of the problem (o0, n1, D2, and n3) can assurne only defined values,
PFO is equivalent to EVEN-2 -PARITY. The result-producing branch can be
simplified to
(values (EVEN-2-PARITY T
(EVEN-2-PARITY (EVEN_2-PARITY D3 D1 )
(EVEN-2-PARTTY D2 D0) ) ) ) ) .
Since (EVEN-2 -PARITv T <<X>>) is equivalent to <<X>>, for all <<X>>, the
result-producing branch can be further simplified to
(values (EVEN-2-PARITY (EVEN-2-PARITY D3 D1)
(EVEN-2-PARTTY D2 D0) ) ) ),
which is equivalent to the even- -panty ftrnction.
25.3 RESUT]TS FOR EVEN.s-PARITY PROBLEM
The previous section established that it is possible to simultaneously evolve
the solution to a problem, the architecture, the primitive functions, and the
terminalswhile satisfyingthe sufficiencyrequirement and theclosure requirement for the even-4parity problem. We now apply this process to a series of
runs of the even-S-parity problem.
612 Chapter 25
Table2l.l Nine-row truth tablefor theprimitive-definingbranch PpO for the solution from generation 8.
ARGl ARGO PFO
0
7
2
3
4
5
6
7
8
NIL
NTL
NIL
T
T
T
: TINDEFTNED
: UNDEFTNED
: LINDEFINED
NIL
T
: TINDEFTNED
NIL
T
: UITDEFINED
NIL
T
: TINDEFINED
T
NTL
NIL
NIL
T
NIL
T
T
: L\TDEFINED
Thble 25.2 Distribution of architectures of the PFs and ADFs of 14 solutions to the
even-s-parityproblem with evolutionof architecture,primitive functions, sufficiency,
terminals and closure.
Run Ceneration
when solved
Number Argument
of PFs map for PFs
Number Argument
of ADFs map forADFs
7
2
3
4
5
6
7
8
9
10
11
12
13
14
4
12
11
12
a
J
2L
10
6
6
21
74
9
11,
5
7
a
J
J
1
7
4
4
2
1
a
J
2
J
2
1
I2l
{2,1,21
12,2,21
tzl
I2l
{1.,2,1.,21
12,2,2,21
11,21
tzl
{2,1,21
12,21
{2,2,21
{2,21
{21
J
4
0
2
0
a
J
1
3
4
4
0
3
2
0
14,3,41
14,4,1,3\
t)
14,41
tl
14,2,Lj
{4}
12,4,41
t4,3,3,31
13,4,2,41
{}
{4,4,31
1t4,41
{}
lA/e made 22nsrrs of which t4(64%) produced a lOO%-correct solutionby
generation 50.
There was no convergence to any parttcular architecture for this particular
problem. Table 25.2 shows the variation among the 14 solutions in both the
number of primitive-defining branches, the number of arguments each pF
possesses, the number of function-defining branches, and the number of arguments each automatically defined function possesses. The average nulnber of PFs is 2.2l andthe average number of automatically defined functions
is 2.07 for the 14 solutions. Four of these 14 solutions did not employ automatically defined functions.
Simultaneous Evolution of Architecture, Primitive Functions, Terrninals, Sufficiency, and Closule
Figure 25.L presents the performance curves based on the 22 runs of the
even-S-parity problem with the evolution of the architecture, primitive functions, sufficiency, terminals and closure. The cumulativeprobability of success,
P(M,i ), is 55% by generation L4 and 64%by generation 50. The two numbers
in the oval indicate that if this problem is run through to generation 14, processing a total of E.ro = 360,000 individuals (i.e.,4,000 x 15 generations x 6
runs) is sufficient to yield a solution to this problem with 99%probability.
In run 1 from table 25.2,the even-S-parity problem is solved on generation
4. The solution has {2} as the argument map for its primitive-defining branches
and {4,3,41 as the argument map for its function-defining branches.
(progn (defun PFO (ARGO ARG1)
(TTuTh-Iable T NIL NIL NIL T T T NIL :UNDEFINED) )
(defun ADFO (ARGO ARG1 ARG2 ARG3)
(values (PF0 (PF0 (PFO (pF0 ARG3 ARGI) (pF0 ARG1 ARG2))
(pFO (pFO ARG1 ARG3) (ppO ARG1 ARGI) ) ) (pF0 ARG1
ARGI) ) ) )
(defun ADF1 (ARGO ARG1 ARG2)
(values (PFO (PFO (ADFO (PFO ARG1 ARGI) (pFO :ITNDEFINED
ARG2) (PFO ARGO ARGI) (PFO NIL ARGI) ) (ADFO (ADFO ARGO
:UI\DEFINED ARGO ARGO) (pFO ARG0 ARGO) (ADF0 ARG2
:UNDEFINED ARGO ARGI) (ADFO T ARG2 ARG1 ARG0) ) ) (pFO
ARGOT))))
(defun ADF2 (ARGO ARG1 ARG2 ARG3 )
(values (PFO (PFO (ADFO (ADFO ARG2 ARG2 ARGO ARGO) (ADF1
ARG0 ARG2 ARG3) (pFO ARG0 ARG3) (pPO ARGO T) ) (pFO
(ADFO ARG1 ARG1 ARG3 ARG3) (ADF1 ARG1 T ARGI))) (ADFI
(PFO (ADFO T ARGO T ARG3) (ADF1 ARG3 ARGO ARG3) ) (ADFI
(ADF0 ARG3 ARG3 :UNDEFTNED T) (ADF1 :UNDEFTNED ARG3
ARGI) (pFO ARG2 ARG2) ) (ADFO (ADF1 T NrL ARG3) (aOFO
ARG1 T ARG2 ARG0) (ADF0 :LINDEFTNED :UIVDEFINED ARG3 T)
(ADF1 ARG3 ARGO ARG3) ) ) ) ) )
(values (PFO (ADF1 (ADF2 (pFO D4 Dl) (ADFO D4 NOrSE D3 D0)
(ADF2 D2 NOISE NIL NOISE) (ADFO D3 NOISE D1 D1) ) (ADF1
(pFO D3 D3) (ADF2 D4 D0 NOrSE D0) (pFO D0 NrL)) (pFO
(ADFO D3 D3 D4 D4) (ADF2 D4 D4 D2 T) )) (pFO (pFO (ADFO
NOISE D2 D2 D3) (ADF2 NIL D4 T :LINDEFINED) ) (ADF0 (ADFI
:UNDEFINED D3 D1) (pFO Dl D4) (PF0 D2 D4) (pFO D3
D4)))))) .
Thble 25.3 shows that PFO from this program has the behavior of the even2-parity function whenever both of its arguments are defined. The three automatically defined functions are defined in terms of p F o and the
result-producing branch is then defined in terms of ero, ADFO, ADFI, and
ADF2.
Figure 25.2 depicts two three-dimensional trajectories, by generation, showing the hits and the number of primitive-defining branches and functiondefining branches in the best-of-generation programs of this run with the
evolutionary method of simultaneously determining the architecture, primiChapter 25
^ 1
a
0
q)
I
I
)
a
CH
I
.-
-
-
A
A
-
k
A , -l
With Defined Functions
25
Generation
2.000.000-
q)
a
a
q)
I
L
A -
-
q.)
A
-
+J
a
-
-
E
)
-
. l
.-
-
I
rl64Vo)
r.000.000
(3,5Vo)
Figure 25.1 Performance curves for the even-S-parity problem showing that E*r,y = 360,000
with evolution of architecture, prirnitive functions, sufficiency, terminals and closure.
Thble 25.3 Nine-row tmth table for the primitive-defining branch PFO from solution from run L to the even-S-parity problem with the evolution of architecture, primitive functions, sufficiency, terminals and closure.
f\I\.U I ARGO PFO
O NIL NIL T
lNILT NIL
2 NIL
3 T
: IINDEFINED NIL
4
5
6
7
8
NIL
T
NIL
T
T : T]NDEFINED T
:UNDEFINED NIL T
:UNDEFINED T N]L
:UNDEFINED :UNDEFINED :UNDEFINED
tive functions, sufficiency, terminals and closure. The axis labeled "branches"
refers to both the number of primitive-defining branches and the number of
function-defining branches. The first trajectory (shown with a broken line)
traces, by generation, the number of primitive-defining branches. The second
trajectory (shown with a solid line) traces, by generation, the number of function-defining branches.
Figure 25.3 is a three-dimensional histogram for run L of the even-S-parity
problem with the evolutionary method of simultaneously determining the
architecfure, primitive functions, sufficiency, terminals and closure. This histogram shows, by generatiory the number of programs in the population of
4,000 with a specified number of primitive-defining branches (from 1 to 3)
615 Simultaneous Evolution of Architecture, Primitive Funclions, Terminals, Sufficiency, and Closure
Hits
Primitive-defining branchesrz t r /t
Function-defining branches Onrrrf
Generation
Figure 25.2 Fitness-branch trajectory of hits and the number of primitive-defining branches
and the number of function-defining branches between generations 0 and 4 for the best-ofgeneration prograrns of run 1 of the even-S-parity problem with evolution of architecture, primitive functions, sufficiency, terminals closure.
1,500
1,000
Figure 25.3 Branch histogrambetween generations 0 and 4 of the number of programs in the
population with various numbers of primitive-defining branches and functiondefining branches
for run l" of the even-S-parity problem with evolution of architecture, primitive functions, sufficiency, terminals closure.
Chapter 25
4
?
2
h
I
a
E
q)
-
)
-
o) L
r-.
-
and a specified number of function-defining branches (from 0 to 4). As can be
seen, there is approximate equality at generation0 inthenumber of programs
with one, two, and threeprimitive-definingbranches and withzero, one, two,
three, and four function-defining branches. However,by generation 4, the
most cofiunon configuration consists of four primitive-defining branches and
three function-defining branches.
25.4 SUMMARY
We have demonstrated that genetic programming is capable of simultaneously
solving a problem while separately evolving the architecture, the primitive
functions, ffid the terminals and while satisfiiing the sufficiency requirement
and the closure requirement.
The fifth major preparatory step in applyrng genetic programming involves
only the administrative matter of determining the termination predicate and
the method of result designation. The fourth major preparatory step involves
selecting parameters, of which the population size, M, and the maximum
number of generations to be nJrr, G, are the most important. Orre can envisage allowing the population size to evolve to its own best level on the basis of
fibress; howeveq, Ibelieve that the best choice of population size for any nonkivial and interesting problem that one is likely to encounter in the foreseeable future is the largest possible population size that is supported by the
available computing machinery. Of course, in nature, there is no artificial limit
on G since populations reproduce themselves indefinitely.
Since the first, second, fourth, fifth, and sixth major preparatory steps as
well as the sufficiency requirement and the closure requirement can either be
replaced by a competitive evolutionary process or canbe said to be of secondary importance, the third major step appears ftased on the limited number of
problems considered) to be the irreducible requirement for genetic programming.
Fitness need not always be explicit (as it is in this book). Instead, it can be
implicit as it is when two or more populations co-evolve in a conunon ecology (Genetic Programming,chapter L6) or as it is when the members of a single
population merely interact with one another and either survive and reproduce or die. Implicit fibress is often used in simulations in the field of artificial
life. However, whether explicit or implicit, the conclusion is that the irreducible requirement for genetic programming is the fitress measure and that
strucfure arises from fitness.
617 Simultaneous Evolution of Architechrre, Primitive Functions, Terminals, Sufficienry, and Closure
r) 6 The Role of Representation and
hrJ the Lens Effect
Representation plays a key role in facilitating or thwarting the solution of
problems by means of artificial intelligence and machine leaming. This chapter explores the question of how a representation employing automatically
defined functions differs froma representationwithout automatically defined
functions.
The focus in this chapter is solely on the role of representation, not on the
role of genetic programming in solving problems. This chapter considers only
populations of random individuals (i.e., the initial random generation in
genetic progamming). fn" initial random generation of a run of genetic programmingis, of course, anexerciseinblind random searchinthe search sPace
of permissible structures for possible solutions to the problem. Given an initial
random population of computer programs, any one of many different parallel adaptive methods could potentially be used to modify the individuals in
an initial random population. For example, one might define a modifying
operator that mutates a single point in the Parse tree of the program to a
different value having the same arity (number of branches radiating away
from the point) and then employ parallel hill climbing, parallel simulated
annealing, or some other adaptive method to create a new generation of programs. However, at the time when the initial random population is created,
no conunitment has yet been made as to which adaptive method might be
used to try to discover better points in the search sPace of the problem.
The Boolean even-3-parity problem (section 6.L) can be used to illustrate
how representation can facilitate or thwart the solution of a problem. For
example, if a three-argument Boolean function is represented as a truth table
(suchas table 6.1), then the search space of the problem consists of 223 = 255
possible eight-row trrth tables. Ablind random search procedure that fills in
the value of the Boolean function for the eight rows of the tmth table with
either T or NrL has a probabiliw of L:255 of finding an arbitrary three-argument Boolean function. Thus, when a three-argument Boolean function is
represented by meeil:rs of a truth table, a blind random search has a probabillty of success of I:256. This probability is independent of which of the 256
three-argument Boolean functions is being sought.
On the other hand, suppose that the search space of the problem consists of
compositions of the terminals from the terminal set
t- {D0,DL,D2!
and the functions from the function set
f- {AND, OR, NAND, NOR}
with an argument map for this function set of
{2,2,2,2}.
Ablind random searchhas avery differentprobability of success when threeargurnent Boolean functions are represented as such compositions (i.e., as
parse trees or LISP S-expressions). Moreover, in contrast to the representation
employing truth tables, the probability of success in the btind random search
is dependent on the particular three-argument Boolean function chosen.
Considel, for example, the Boolean €ven:3-parity function (three-argument
Boolean rule 105). The probability of success of a blind random search in the
sPace of compositions of functions from the function set, f, and terminals
from the terminal set, { is so low that, after 10,000,000 tries, we never found
even a single random Program that performed the even-3-parity function
(Genetic Programming, table 9.3). In other words, in the space of computer
Programs (over f and'T), the even-3-parity function is extremely difficult to
find by means of a blind random search. The representation of Boolean functions as computer programs makes the leaming of the even-3-parity problem
decidedly more difficult than a representation based on a truth table. The
extreme dfficulty of solving the even-&parity problem by blind rand om search
is reflecte dby the fact that the cumulative probabilify of success, p(M
, i) ,
is
0"/" for generation 0 in the performance curve in figure 6.2 for the even3-parity function without automatically defined functions. The minimum
number of individuals required to be processed to yield a solution to the even3-parity problem (figure 6.2) was only SM,}}}individuals (i.e., 34 runs with a
population of 16,000). This is considerably smaller than 10,000,000.
In contrast, three-argument Boolean rule 000 ("Always of() is randomly
generated with a probabilify of 7:6.76 in the space of compositions of functions from the same function seL, f,and terminals from the same terminal set, 'I, so rule 000 is much easier to find in the space of programs than in the space
of truth tables.
The reader may have noticed that the cumulative probability of success,
P(M,i), is 39% for generafion 0 in the performance curve in figure 6.10 for
the even-3-parity function with automntically definedfunctionsfor a population
of 16,000. That is, in approximately two out of five runs, there is at least one
solution in generation 0 among the 16,000 programs. hr fact, there are 15 such
Programs in the 528,000 individuals contained in the 33 runs involved, so the
probability of solving this problem by means of blind random search with
automatically defined functions is about L:35,200.
A probability of success of L:35,200 is a considerable improvement over a
failure to find a single solution in 10,000,000 tries. Representation is the reason for this difference in probability of success of these two blind random
searches. \Aflhen automatically defined functions are not being used, the
Chapter 26
elements of the searchspace are computerprograms consisting onlyof a resultproducing branch. However, when automatically defined functions are being
used, the elements of the search space are computer programs consisting of
two function-definingbranches and one result-producing branch. The availability of automatically defined functions provides a significantly different
way of representing Boolean functions and a dramatically different way of
viewing the space of Boolean functions.
This chapter explores the way automatically defined functions change the
way of looking at problem spaces for several different problems from this
book. To do this, this chapter compares the distribution of values of fibress for
1,000,000 randomly generated programs without automatically defined functions with the distribution of values of fitness for 1,000,000 randomly generated programs with automatically defined functions. The distributions are
examined in tabular form, visualized as histograms, quantified by their means
and standard deviations, and further quantified by the values of their outliers. For each problem, the L,000,000 random Programs are generated in the
same manner in which the programs were generated when the problem was
first treated herein. Specifically, the Proglams were generated with the same
function set, the same terminal set, the same arrangement of defined functions (if any), and the same limitations on program size or depth. The programs were generated as if they belonged to generation 0 of a population of
5,000 and then consolidated into a grouP of 1,000,000'
26.1 EVEN-3r 4-,*,AND 6-PARITY PROBLEMS
We start with the even-3-,4-,5-, and 6-panty problems'
26.L.1 Even-3-PariW Problem
Table 26.L shows the distribution of values of raw fitness for the even
3-parity function for 1,000,000 randomly created programs with automati-
.ully defined. functions and 1,000,000 randomly created programs without automatically defined functions. TWo two-argument automatically
defined functions are used. Raw fitness (hits) ranges between 0 and 8 for
this problem.
As can be seen, no programs out of the 1,000,000 without automatically
defined functions achieve a perfect score of eight hits. This result is consistent
with the experiment cited above involving 10,000,000 random programs. Lr
contrast, 33 programs out of the 1,000,000 with automatically defined functions score eight hits. These 33 lOO%-correct solutions suggest a probability of
1:30,303 (which is close to the probabili} of 1:35,200 computed above with
the different sample of size of 528,000).
There are no programs that score zero hits without automatically defined
functions, whereas there are 49 Proglams that do so with them.
Lr additiory there are only 33 near-perfect programs scoring seven hits without automatically defined functions, whereas there are 55 such near-perfect
The Role of Representation and the Lens Effect
Table 26,1' Distribution of raw fitness (hits) for the even-3-parity problem with and
without ADFs.
Raw fibress
(hits)
Without ADFs With ADFs
0
1
2
3
4
5
6
7
8
0
46
5,803
756,227
675,629
156,495
5,767
33
0
49
l n 3/
2,022
63,545
869,656
62,587
1,,996
55
33
Programs with them. Similarly, only 46 programs score one hit without automatically defined functions;S7 do so with them.
hr other words, when the even-3-parity problem environment is viewed
through the lens of automatically defined functions, a blind random search is
more likely to find outliers scoring extreme values (such as zero and eight
hits) with automatically defined functions than without them. We call this
difference the lens ffttt.
The means of the two distributions are, ofcourse, each four.
The distribution with automatically defined functions has a lesser variance
than without automatically defined functions for this particular problem. There
are 869,656 programs scoring four hits (the mean) with automatically defined
functions versus 675,629 without them. There are only 62,587 p.ogtu-, scoring five hits (i.e., the mean plus one) with automatically defined functions
versus 1'56,495 without them. Similarly, as one would expect because of the
symmetry of this problem, there are only 63,545programs scoring three hits
(i.e., the mean minus one) with automatically defined functions versus 156,227
without them. There are only about a third as many programs scoring six hits
(and trvo hits) with automatically defined functions than without them. The
standard deviations of the distributions are 0.60 without automatically defined functions and 0.38 with them.
Frgure 26.1 shows the hits histograms for 1,000,000 randomly generated
Programs for the even-3-parity problem, with and without automatically
defined functions. The vertical axis for these histograms (and all the other
histograms in this chapter) employs a logarithmic scale running between L
(100) and 1,000,000 (106). The bars of the histogram start somewhat below the
dotted line at 1 (100), so as to highlight the absence of any programs scoring a
particular value of fitness. For example, in the upper histogram applying to
the 1,000,000 programs without automatically defined functions, there are no
622 Chapter 26
Without Defined Functions
I
I
104
l0
l0l
I
€)
P
9
c)
€)
L
3 4
Hits
0 I 2 3
Hil
s 6 7 8
Figure 26.1 Hits histograms for the even-3-parity problem for 1,000,000 randomly generated
prograrns with and withoutADFs.
prograrns scoring either zero or eight. The logarithmic scale highlights the
difference in the distributions of the outliers.
br summary, the distribution with automatically defined functions has more
extreme outliers, but less variance, than the distribution without them.
If the points in the search space of three-argument Boolean functions are
represented as truth tables, there is a I:256 probability of solving the even3-parity problemby means of blind random search. If the points in the search
space of three-argument Boolean functions are represented as three-branch
computerprograms consisting of one result-producingbranch and two function-defining branches (over the function and terminal sets being used here),
there is a probability of about 1:30,303 of solving the even-3-parity problem
by means of blind random search. But if the points in the search space of
three-argument Boolean functions are represented as single-branch computer
programs (consisting only of one result-producing branch), then the probabilityof solving the even-3-parity problembymeans of blind random search
is very small (smaller than 1:10,000,000).
Of course, when automatically defined functions are involved inrandomly
generated programs, the defined functions are simply random defined functions. Since no Darwinian reproduction and no genetic crossover has yet
occurred, the fihress measure plays no role in this random generative process.
The Role of Representation and the Lens Effect
[623
With Defined Functions
1
I
I
101
Table 26.2 Dishibution of raw fitress (hits) for the even4-parity problem with and without ADFs.
Rawfifiress WithoutADFs
(hits)
With ADFs
0
1,
2
^
J
4
5
6
7
8
9
10
11
12
13
1.4
15
16
0
0
0
0
7
62
3,023
94,604
824,9L0
84,40L
2,906
78
9
0
0
0
0
L
0
2
7
52
791
2,099
37,702
920,273
37,424
2,025
154
60
10
0
0
0
The generation of the two sets of 1,000,000 programs described above does
not involve either the Darwinian operation of reproduction or the genetic
operation of crossover. The difference in success of finding l0O%-correct
solutions to the even-3-parity function reflects only the way points in the search
space of the problem are represented.
The representation chosen to view the points in the search space of the
problem (i.e., the three-argument Boolean function) is a kind of lens through
which the system views the world. It appears that a computer program incorporating automatically defined functions provides a different lens for viewi.g u highly regular, symmetric, and homogeneous function such as the
even-3-parify fturction than does a computer program composed of similar
ingredients without automatically defined functions.
Solutions to problems are outliers. If the goal is finding solutions to problems, this different lens may be a better lens. As we will see in the remainder
of this chapteq, this lens effect appears in other problems.
26.1.2 Even-4-ParityProblem
The lens effect also appears in the even-4-parity problem.
Table 26.2 shows the distribution of values of raw fibress (between 0 and
16) for the even-4-parity function with or without automatically defined functions. TWo three-argurnent automatically defined functions are used. There
Chapter 25
Without Defined Functions
10s
104
I
q)
e)
Li
H
I
1
101
I
I
1
I
I
l0l
I
0123456 7 8 9
Hits
105
h
9
q)
9a
-
01234567
Figure26.2 Hits bistograms for the even-4panty problem for L,000,000 randomly generated
programs with and without ADFs.
are no lQQ%-correct solutions to this problem among the 1,000,000 randomly
generated progranrs either with or without automatically defined functions.
As before, the two distributions are different. For example, the distribution
with automatically defined functions has 10 outliers scoring 13 hits and 60
outliers scoring L2 hits and also has seven outliers scoring three hits. Lr contrast, there are no programs scoring 13 or three hits without automatical$
defined functions. Lr addition, one program out of the 1,000,000 scores zero
hits and two programs score two hits with automatical$ defined functions.
Thus, there is evidence of the lens effect for this problem.
The means of the two distributions are each eight. The distribution shown
in this table with automatically defined functions has a lower variance than
the distribution without automatically defined functions. There are only 37 ,424
programs scoring nine hits (i.e., the mean plus one) with automatically defined functions, versus 84,401. without them. Similarly, as one would expect
from symmetry, there are only 37,702 programs scoring seven hits (i.e., the
mean minus one) with automatically defined functions, versus 84,604 without them. There are only about two-thirds as many programs scoring tenhits
and six hits with automatically defined functions as without them. The standard deviation of the distribution is 0.44 without automatically defined functions and 0.31 with them.
8 9
Hits
t0 11 12 13 14
With Defined Functions
625 The Role of Representation and the Lens Effect
Table 25.3 Dstribution of raw fihress (hits) for the even-S-parity problem with and
without ADFs.
Raw fitness Without ADFs
(hits)
WithADFs
0-7
8
9
10
11
L2
13
14
15
1.6
17
18
t9
20
21
22
23
24
25-32
0
0
0
0
0
a
J
33
1,003
25,956
946,015
25,996
980
2l
3
0
0
0
0
0
0
1
0
a
J
8
65
253
2,099
23,134
948,679
23,236
2,153
306
62
10
2
0
7
0
Figure 26.2 shows the hits histograms for 1,000,000 randomly generated
programs for the even-4-parity problem with and without automatically defined functions. The figure shows the presence of outliers farther from the
mean when automaticallv defined functions are involved.
J
26.1,.3 Even-S-ParrtyProblem
We again see the lens effect for the even-S-parity problem employing two
f our-argument defined functions.
Table 26.3 shows the distribution of values of raw fitoress (between 0 and
32) for the even-S-parity function. The first and last six rows of this table are
omitted since none of the 1.,000,000 randomly generated programs, with or
without automatically defined functions, scores a value of fitness in these
remges. As with the even-A-pafity problem there are no L00%-correct solutions to this problem among the 1,000,000 randomly generated programs either with or without automatically defined functions. There are programs
scoring 24,22, and 21 (and 8, 10, and 11) hits with automatically defined functions whereas there are no programs achieving those scores without automatically defined functions. Moreover, there are considerably more progralns
scoring 20,79, and L8 (and 12, L3, and 14) hits with automatically defined
626 Chapter 26
Without Defined Functions
h
a)
c)
c)tr
-
I
I
104
I
I
101
I
8-9 10-ll 12-13 14-15 lGl? 18-19 20-21 22-23 24-25 26-2'1 28-29 3G3l
Hits
i7 8-9 lGll 12-13 1.f15 1617
Figure 25.3 Hits histograms for the even-S-parity problem for 1,000,000 randomly generated
programs with and withoutADFs.
functions than without them. Thus, there is again evidence of the lens effect
for this problem.
The means of the two distributions are each 16. The standard deviation of
the distributionwithout automatically defined functions is 0.25 and 0.27with
them.
Figure 26.3 shows the hits histograms for 1,000,000 randomly generated
programs for the even-S-parity problem with and without automatically defined functions.
26.L.4 Even-6-Parity Problem
Figure 26.4 shows the hits histograms for L,000,000 randomly generated
programs for the even-6-parlty problem with and without two five-argument automatically defined functions. Asbefore, the figure shows the presence of outliers farther from the mean when automatically defined
functions are involved.
25.L.5 Summary for the Parity Problems
Table 26.4 shows the mean and standard deviations of distributions for the
even-3-, 4-, 5-, and 6-parity problems of 1,000,000 randomly generated
The Role of Representation and the Lens Effect
I
c)
€)
li
E
With Delined Functions
627
Without Defined Functions
I
q)
-l
E
105
I
l0
I
101
1
8-ll 12-15 t6-t9 20-23 2+n 28-31 32-35 3G39 4443 M47 48-51 52_55 56_59 60_63 64
Hits
I
I
104
I
I
q)
q)
li
-
t02
101
I
,o
'o' 4u43 44'4'1 48-s1 52-s5 s6-59 60-63 64
Figure 26.4 Hits histograms for the even-6-parity problem for 1,000,000 randomly generated
programs with and without ADFs.
Programs, with and without automatically defined functions. The table also
shows the highest value of raw fitness (hits) and the number of occurrences
of that outlying value of raw fitness. For example, the highest number of hits
for the 1,000,000 initial random individuals for the even-3-parity functionwith
automatically defined ftrnctions is 8; there are 33 such 8-scoring outliers in
the 1,000,000 programs with automatically defined functions.
26.2 THE LAWNMOWER PROBLEM
This section shows evidence of the lens effect for the lawnmower problem for
lawn sizes of 32,48, &,80, and 96.
26.2|1, Lawnmower Problem with Lawn Size of 32
Table 26.5 shows that there are no 10O%-correct solutions to this problem
among the 1,000,000 randomly generated programs for the lawnmower problem (section 8.1) with a lawn size of 32 without automatically defined functions, but there are 60 among the 1,000,000 randomly generated programs
with automatically defined functions. There are no programs scoring in the
range between 12 and 32 hits without automatically defined functions; howeve{, there are 120,532 programs (about 12% of the 1,000,000) in this range
Chapter 26
With Defined Functions
628
Table 26.4 Summary for distributions for the even-3-/ 4-,5-, and 6-panty problems
of L,000,000 randomly generated programs with and withoutADFs.
Arity
Without With
ADFs ADFs
Without With
ADFs ADFs
Without With Without With
ADFs ADFs ADFs ADFs
Mean
Standard
deviation
Best outlier
Outlier
frequency
4.00 4.00 8.00
0.600 0.380 0.M1
7812
33339
8.00
0.312
1,3
10
L6.0 76.0 32.0 32.0
0.246 0.267 0.188 0.226
with automatically defined functions. There is again evidence of the lens effect for this problem.
Figure 26.5 shows the hits histograms for 1,000,000 randomly generated
programs for the 32-square lawnmower problem, with and without automatically defined functions. The mean of the distribution is 2.0 without automatically defined functions and 5.3 with them. The standard deviation is 1.3
without automatically defined functions and 5.4 with automatically defined
functions.
26.2.2 Lawnmower Problem with Lawn Size of 48
There are no 100%-correct solutions to this problem among the 1,000,000 randomly generated programs without automatically defined functions, but there
are LL among the 1,000,000 randomly generated programs with automatically defined functions. The largest number of hits scored by a program without automatically defined functions among 1,000,000 randomly generated
programs for the 48-square lawnmower problem is 11. There are no programs
scoring between t2 and48 hits without automatically defined functions, but
there are 1.33,579 programs (about 13% of the 1,000,000) having between 12
and 48 hits with automatically defined functions. There is again evidence of
the lens effect for this problem.
Figure 26.6 shows the hits histograms for 1,000,000 randomly generated
programs for the 48-square lawnmower problem with and without automatically defined functions. The mean of the distribution is 2.06 without automatically defined functions and 5.8 with them. The standard deviation is L.29
without automatically defined functions and 6.81 with them.
26.2.3 Lawnmower Problem with Lawn Size of 64
There are no 100%-correct solutions to the 64-square lawnmower problem
among the 1,000,000 randornly generated programs without automatically
defined functions, but there is one 1007"-correct solution among the 1,000,000
42
1
39
7
24
1
20
a
J
629 The Role of Representation and the Lens Effect
Thble 25.5 Distribution of raw fihress (hits) for the 32-square lawnmower problem
with and without ADFs.
Rawfihress WithoutADFs
(hits)
WithADFs
0
1,
2
3
4
5
6
7
8
9
10
t1
12
13
1,4
15
1,6
1 7
18
79
20
21.
22
23
24
25
26
27
28
29
30
31
32
M,T73
335,319
349,395
150,315
70,261
30,674
13,357
4,939
1,369
248
55
7
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
20,068
747,1.89
2I1,186
1.42,63'1.
1.07,549
66,564
51.,547
37,900
35,128
23,394
19,928
76,3U
14,592
71,758
71,,410
9,379
9,572
8,263
7,902
7,021
7,223
6,242
5,900
4,921.
4,773
3,79'1,
3,L48
2,453
1,394
696
359
85
60
Chapter 26
Without Defined Functions
With Defined Functions
G1 2-3 +5 67 8-9 10-ll 12-13 14-15 16-17 18-19 20-2t 22-23 U-25 26-2'1 28-29 3G3t 32
Hits
&l 2-3 +5 67 8-9 l&ll t2-13 t+15 16-11 \8-19 2U2l 22-23 2+25 26-2',1 28-29 30-31 32
Hits
Figure 26.5 Hits histograms for the 32-square lawnmower problem for 1,000,000 randomly
generated programs with and withoutADFs.
o-2 3-5 6-8 12-14 1r1',7 l&20 21:23 2+26 n-D 3G32 33-35 3G38 3941 4244 4547 48
Hits
u2 3-5 G8
Figure 26.6 Hits histograms
generated programs with and
t8-2D 2r-23 2+25 n-29 3G32 33-35 3G38 3941 4244 454'7 48
Hits
for the 48-square lawnmower problem for 1,000,000 randomly
withoutADFs.
Without Defined Functions
With Defined Functions
The Role of Representation and the Lens Effect
I
I
>> 104
I
(l)
(|)
,1. -
102
l0l
I
0)
g
-
4-7 8-11 12-15 16-19 20-23 24-n 28-31 32-35 3G39 q-$ 4+47 48-51 52-55 56-59 60-63 64
Hits
+7 8-ll 12-15 16-19 20-23 24-21 2A-31 32-35 36-39 4+43 4+47 48-51 52-55 sG59 60-63 64
Hits
Figure 25.7 Hits histograms for the 64-square lawnmower problem for 1,000,000 randomly
generated programs with and withoutADFs.
randomly generated programs with automatically defined functions. There
is a complete absence of programs scoring between 13 and 64 hits without
automatically defined functions; howeve4 there are L24,9t5 programs (over
12% of the 1,000,000) in this range with automatically defined functions.
Figure 26.7 shows the hits histograms for L,000,000 randomly generated
programs for the lawnmower problem with and without automatically defined
functions. The means of the distributions are 2.07 without automatically
defined functions and 6.19 with them. The standard deviations are 1.31 without automatically defined functions ard7.71. with them.
26.2.4 Lawnmower Problem with Lawn Size of 80
The largestnumber of hits scored among L,000,000 randomly generated programs for the 80-square lawnmower problem without automatically defined
functions is \2, but the largest number of hits scored by a program eunong
1,000,000 randomly generated programs with automatically defined functions ts 77. There are no programs scoring between L3 and 17 hits without
automatically defined functions, but there arel27,10L programs (about 13%
of the 1,000,000) having between 13 and 77 hits with automatically defined
functions. Thus, there is again evidence of the lens effect for this problem.
Without Defined Functions
With Defined Functions
632 Chapter 26
Without Defined Functions
I
I
1
I
-
€)
€)
ti
103
102
l0l
I
I
5-9 lGl4 t5-r9 2024 25-29 vly 35-39 4044 4549 5U54 55-59 6061 66 1G74'75-79
Hits
105
I
q)
I
-
I
I
I
101
1
-19 2UA 25-29 3G34 3539-4044 4549 50-54 55-59 gJ@ 65-691M4',7s-',79 80
Figure 2G.8 Hits histograms for the 8O-square lawnmower problem for 1,000,000 randomly
generated programs with and without ADFs.
Figure 26.8 shows the hits histograms for 1,000,000 randomly generated programs for the 80-square lawnmower problem with and without
automatically defined functions. The means of the distributions are 2.06
without automatically defined functions and 6.35 with them. The standard deviations are 1.3L without automatically defined functions and8.29
with them.
26.2.5 Lawnmower Problem with Lawn Size of 95
The largest number of hits scored among 1,000,000 randomly generated programs for the 96-square lawnmower problem without automatically defined
functions ts 12, but the largest number of hits scored by a program among
1,000,000 randomly generated programs with automatically defined functions is 88. There are no programs scoring befween 13 and 88 hits without
automatically defined functions, but there are128,088 programs (about 13%
of the 1,000,000) having between 13 and 88 hits with automatically defined
functions. This problem also shows evidence of the lens effectFigure 26.9 shows the hits histograms for 1,000,000 randomly generated programs for the 96-square lawnmower problem with and without
automatically defined functions. The mean of the distribution is 2.07 without automaticallv defined functions and 6.43 with them. The standard
The Role of Representation and the Lens Effect
With Defined Functions
633
Without Defined Functions
>-)
I
c)
€) tr
I
1
' > l
9
trl
€)r
t-
- 1
Y I €)
L
k l
I
Gll 12-17 18-23 24-29 30-35 36-41 42-4't 48-53 5+59 60-65 &7t'72-7',r 78-83 84-89 go:gs 96
Hits
0-5 6-11 l2-l'1 18-23 U-29 3U35 36-41 4247 48-53 54-59 60-65 6:71 72-7't 78-83 8489 90-95 96
Hits
Figure 25.9 Hits histograms for the 96-square lawnmower problem for 1,000,000 randomly
generated programs with and withoutADFs.
deviation is 1.3L without automatically defined functions and 8.60
with them.
26.2.6 Summary for the Lawnmower problem
Table 26'6 shows the mean and standard deviations of distributions for
the lawnmower problem with lawn sizes of 32,49,64,g0, and 96 of1,000,000
randomly generated programs, with and without automatically defined
functions. The table also shows the highest value of raw fitness (hits) and
the number of occurrences of that outlying value of raw fitness.
26.3 THE BUMBLEBEE PROBLEM
This section shows evidence of the lens effect for the bumblebee problem
with I0,I5,20, and25 flowers.
26.3J1. Bumblebee Problem with 10 Flowers
Figure 26.10 shows the hits histograms for 1,000,000 randomly generated programs for the bumblebee problem with 10 flowers, with and without automatically defined functions.
Chapter 26
0J
92
gl
With Defined Functions
634
Table 26.6 summary of distributions for the lawnmower prgblel with a lawn size of 32' 48' &' 80'
and 96and 1,,000,000 iandomly generated programs with and without ADFs'
32 48 4864 80 80 96 96 Lawn 32
size
Without With
ADFs ADFs
Without With Without With
ADFs ADFs ADFs ADFs
Without With Without With
ADFs ADFs ADFs ADFs
Mean 2.05
Standard t.27
deviation
Best 11
outlier
Outlier 7
frequency
5.34 2.06 5.88 2.07
5.41 1.29 6.81 1,.31
6.19
7.7r
64
1
2.07
L.31
12
1
6.35
8.29
77
J
2.07 6.43
1.31 8.60
T2 12 88 48
11
11
10
32
60
Without Defined Functions
I
q)
I
-
1
103
t0
101
1
I
I
104
I
1
101
I
0 1 2 3 orrio 6 7 u e r u
Figure 2b.11 Hits histograms for the bumblebee problem with 15 flowers for 1,000,000 randomly generated Programs with and withoutADFs.
9
0)
,gWith Defined Functions
63s The Role of Representation and the Lens Effect
Without Defined Functions
>)
I
q)
q)
L
-
I
I
104
t0
I
101
I
I
105
104
103
10
101
I
0l23orrio'78e10
Figure 26.12 Hits histograms for the bumblebee problem with 20 flowers for 1,000,000 randomly generated programs with and withoutADFs.
26.3.2 Bumblebee Problem with L5 Flowers
Figure 26.ll shows the hits histograms for 1,000,000 randomly generated programs for the bumblebee problem with 15 flowers, with and without automatically defined functions.
26.3.3 Bumblebee Problem with 20 Flowers
Figure 26.12shows thehits histograms for 1,00Q000 randomlygenerated programs for the burnblebee problem with 20 flowers, with and without automatically defined functions.
26.3.4 Bumblebee Problem with 25 Flowers
Figure 26.73 shows the hits histograms for 1,000,000 randomly generated programs for the bumblebee problem with 25 flowers, with and without automatically defined functions.
45678910
Hits
9
q)
P
With Defined Functions
Chapter 26
Without Defined Functions
With Defined Functions
h
a)
Q)
ol
li
H
0123456'.7
Hits
01234567 Hits
Figure 26.13 Hits histograms for the bumblebee problem with 25 flowers for 1'000'000 randomly generated Programs with and without ADFs'
26.3.5 Summary for Bumblebee Problem
Table 26.7 shows the mean and standard deviations of distributions for the
bumblebee Problem with 10, L5, 20, and 25 flowers of 1,000,000 randomly
generated piogtu* with and without automatically defined functions. The
table also shows the highest value of raw fitness (hits) and the number of
occurrences of that outlying value of raw fitress'
26.4 OBSTACLE.AVOIDING.ROBOT PROBLEM
This section shows that the obstacle-avoiding-robot problem (chapter L3)
also shows evidence of the lens effect. Raw fitness (hits) ranges between 0
and 116 for this problem. There are no programs scoring L9 or more hits
without automatically defined functions, but there ate 36,094 programs
(3.67, of the 1,000,000) scoring between 19 and 91 hits with automatically
defined functions.
Figure 26.14shows the hits histograms for 1,000,000 randomly generated
programs for the obstacle-avoiding-robot problem with and without automatically defined functions. The mean of the distribution is 2.83 without
automatically defined functions and5.47 with them. The standard deviation
is L.98 without automatically defined functions and 5.78 with them.
637 The Role of Representation and the Lens Effect
Table 26'7 Summary for distributions for the bumblebee problem with 10 Is,z;,and 25 flowers of 1,000,000 randomly generated programs with and withoutADFs.
Number 10
of flowers
10 15 15 20 25
\dithout With Without With
ADFs ADFs ADFs ADFs
Without With
ADFs ADFs
Without With
ADFs ADFs
Mean
Standard
deviation
Best
outlier
Outlier
frequency
0.0402
0.268
3
4
0.0329 0.1,61
0.235 0.451
0.131 0.162
0.402 0.453
0.133 0.1.62 0.133
0.406 0.453 0.405
25 24
Without Defined Functions
I
q)
q)
li
I
I
104
103
I
l0l
G13 t+n z84t 42-5s 5G69 7G83
Hits
84-97 98-ll2 113-116
h
I
q)
q)
L
1
I
I
1
I
l0l
I
G I 3 r+n 284t 42_55 5649 7G83 Wg7 g8-t 12 I 1 3_ I I 6
Hits
Figure 25.14 Hits histograms for the obstacle-avoiding-robot problem for 1,000,000 randomly
generated programs with and withoutADFs.
With Defined Functions
Chapter 26
Without Defined Functions
hqJ
o.)
€)
k
E
G13 t+n 2841 42-55 5669
Hits
7G83 84-97
gs
ga
g3
92
6l
h
I
o)
€)
L
Fl
Gl3 1+27 2847 42-55 5G69 7G83 8+n
Hits
Figure 26.15 Hits histograms for the minesweeper problem for 1,000,000 randomly generated
programs with and without ADFs.
26.5 MINESWEEPER PROBLEM
The minesweeper problem (chapter L4) also shows evidence of the lens effect. Raw fihress (hits) rzrnges between 0 and 116. There are no Programs scoring 21or more hits without automatically defined functions, but there are
I4A1Sprograms (1.4% of the 1,000,000) scoring between 21 and 58 hits with
automatically defined functions.
Figure 26.15 shows the hits histograms for 1,000,000 randomly generated
programs for the minesweeper problem with and without automatically defined ftrnctions. The mean of the distribution is 2.SL without automatically
defined functions and 4.85 with them. The standard deviation is 1.96 without
automatically defined functions and 4.23 with them.
26.6 ARTIFICIAL ANT PROBLEM
There is evidence of a slight lens effect for the artificial ant problem (chapter
L2). Raw fitness (hits) ranges between 0 and 96 for this problem. There are no
programs scoring between 88 and 92 hits without automatically defined functions, but there are four programs in this range with automatically defined
functions. Moreover, between 57 and 87 hits, every entry in the table for
With Defined Functions
639 The RoIe of Representation and the Lens Effect
Without Defined Functions
I
1
I
I
q)
e)
L
103
rc2
l0l
h
I
c)
P
-
l2-t7 t8-23 2+29 30-35 3G4t 42_4-t 48_53 54-59 6U65 ff'-71 12_,1,t 78_83 84_89 90_95
Hits
1E-23 24-29 3G35 36-41 42-47 48-53 54 59 60-65 66-71 1T_.t.t .tB_83 8,t_89 90_95 96
Figure 25.16 Hits histograms for the artificial ant problem for 1,000,000 randomly generated
programs with and withoutADFs.
automatically defined functions is higher than the corresponding entry without automatically defined functions. There are only L,SSgprograms (0.18% of
the 1,000,000) without automatically defined functions between 57 and 8T
hits, whereas there arc 4,007 programs (0.40%of the j.,000,000) without auto_
matically defined functions.
The mean of the distribution is 10.08 without automatically defined functions and L0.06 with them. The standard deviation is Lz.&without automatically defined functions and Ii.I7 with them.
Figure 26.16 shows the hits histograms for 1,000,000 randomly generated
Programs for the artificial ant problem with and without automatically defined
functions
26.7 DISCUSSION
We have observed, for the problems reviewed in this chapteq, that when the
representation of problem environments employs automatically defined functions, a blind random search yields higher scoring outliers than when the
representation does not employ automatically defined functions. The possible contribution of this lens effect to the operation of genetic programming
with automatically defined functions warrants further investigation.
With Defined Functions
I
l0s
I
I
I
640 Chapter 26
Relmolds $99a) considers three versions of a problem calling for the discovery of a controller for a corridor-following robot (subsection F.3.4 in
appendix F). The robot had a roving sensol an arbitrary static sensor, and a
predetermined static sensor in the three versions. The histograms of fibress
in the initial random generation of Reymold's runs were distinctly different
for the three versions. The version of Relmold's problem with the best outlier
proved to be the easiest to solve. These differences foreshadowed the difficulty of solving the problem in the actual full runs of genetic programming
and are another the existence of the lens effect.
641 The Role of Representation and the Lens Effect
27 Conclusion
Main point 1 was stated as follovrs in chapter 1:
Main point 1-: Automatically defined functions enable genetic programming to solve a variety of problems in a way that can be interpreted as a
decomposition of a problem into subproblems, a solving of the subproblems,
and an assembly of the solutions to the subproblems into a solution to the
overall problem (or which can altematively be interpreted as a search for
regularities in the problem environment, a change of representation, and a
solving of a higher level problem).
The numerous illustrative problems in this book provide evidence in support of this conclusion that genetic programming with ADFs does indeed
work.
Main point 2 was stated as follows:
Main point 2: Automatically defined functions discover and exploit the
regularities, sFrunetries, homogeneities, similarities, pattems, and modularities of the problem environment in ways that are very different from the style
employed by human prograruners.
This point has, of course, been repeatedly made throughout this book in
the many examples.
Main point 3 was stated as follows:
Main point 3: For a variety of problems, genetic programming requires less
computational effort to solve a problem with automatically defined functions
than without them, provided the difficulty of the problem is above a certain
relatively low problem-specific breakeven point for computational effort.
Table 27.1 summarizes the efficiency ratio, R6, and the structural complexity ratio, R5, for various problems covered in this book.
An examination of the rightmost column of the first five rows of the table
indicates that the efficiency ratio, R", is less than 1 (indicating that a greater
number of fitness evaluations is required to yield a solution of the problem
with automatically defined functions than without them). These five rows
correspond to the simple two-boxes problem (chapter 4) and the simpler versions of the four problems in chapter 5 that straddle the breakeven point for
computational effort. However, starting with the scaled-up versions of the
four problems in chapter 5 and continuing alt the way down the rightmost
column of the table, we see that all the other problems in this book have an
efficiency ratio of greater than 1 (indicating that fewer fitness evaluations are
required to yield a solution to the problem with automatically defined functions than without them).
Main point 4 was stated as follows:
Main point 4: For a variety of problems, genetic programming usually
yields solutions with smaller overall size (lower average structural complexity) with automatically defined functions than without them, provided the
difficulty of the problem is above a certain problem-specific breakeven point
for average structural complexity.
Starting with the even-4p arity problem on the ninth row of the table and
continuing down the rightmost column of the table,we see that the structural
complexity rctio is greater than 1 (indicating a smaller average size for successful runs of the problem with automatically defined functions than without them) except for the three isolated exceptions. The exceptions occur for
the two extreme values of the architectural parameters for the even-S-parity
problem (out of L5 combinations considered) and the subset-creating version
of the transmembrane problem (where the "average" strucfural complexity
comes from only one successful run).
Main point 5 was stated as follows:
Main point 5: For the three problems herein for which a progression of
several scaled-up versions is sfudied, the average size of the solutions produced by genetic prograrnming increases as a function of problem size at a
lower rate with automatically defined functions than without them.
Main point 6 was stated as follows:
Main point 5: For the three problems herein for which a progression of
several scaled-up versions is studied, the computational effort increases as a
function of problem size ata lower rate with automatically defined functions
than without them.
The evidence reported in sections 6.15 (parityproblems),8.L5 (lawnmower
problems), and 9.13 (bumblebee problems) supports main points 5 and 6.
Main point 7 is closely related to main points 5 and 6 and was stated as
follows:
Main point 7z For the three problems herein for which a progression of
several scaled-up versions is sfudied, the benefits in terms of computational
effort and average structural complexity conferred by automatically defined
functions increase as the problem size is scaled up.
Specifi cally, table 27 .1 shows tha t the efficiency ra tios f or the group of evenparity problems are 1..50, 2.r8, 14.07, and 52.2 as the problem is scaled up
from threg to fouq, to five, and to six arguments, respectively. The efficiency
ratios for the group of lawnmower problems are 3.80, 6.22,9.09, 33.0, and
283.7, as the lawn size is scaled up from 32, to 48, to 64, to 80, and to 96,
respectively. The efficiency ratios for the group of bumblebee problems are 'l'.20,'l'.27,'1..24, and3.2, as the number of flowers is scaled up from 10, to '!.E,to
20, and to 25, respectively. These monotonically increasing efficiency ratios
exhibited within all three groups of problems suggest that the facilitating benefits of automatically defined functions increase as problems are scaled up.
Chapter 27
Table 27.7 Summary table of the structural complexity ratio, R5, and the efficiency ratio, Rs, for
various problems.
Problem Reference Structural Efficiency
ComplexityR5 ratio Rg
TWo boxes
Quinticpolynomial xs -2x3 + x
Boolean S-symmetry
Three-sines sin.r + sin2x + sin 3r
TWo-term xln+x2 lt*
Sexticpolynomial x6 -2x4 +x2
Boolean Gsymmetry
Four-sines sin x + sinZx + sin 3x + sin 4x
Three-term x l n+ *' l n2 +Zta
Even-3-parity - M = 1.6,000
Even-&parity - M =16,040
Even-S-parity -M = 16,000
Even-6-parity -M = 16,000
Even-S-panty -M = 4,000 - one two-argumentADF
Even-S-parity - M = 4,000 - one three-argument ADF
Even-S-parity - M = 4,000 - one four-argument ADF
Even-S-parity - M = 4,000 - two two-argument ADFs
Even-S-parity - M = 4,000 - two three-argument ADFs
Even-S-parity - M = 4,000 - two four-argument ADFs
Even-S-parity - M = 4,000 - three two-argument ADFs
Even-S-pafity - M = 4,000 - three three-argument ADFs
Even-S-parity - M = 4,000 - three four-argument ADFs
Even-S-parity - M = 4,000 - four two-argument ADFs
Even-S-parrty - M = 4,000 - four three-argument ADFs
Even-S-parity - M = 4,000 - four four-argument ADFs
Even-S-parlty - M = 4,000 - five two-argument ADFs
Even-S-parity - M = 4,000 - five three-argument ADFs
Even-S-parity - M = 4,000 - five four-argument ADFs
Lawnmower-lawn size3T
Lawnmower - lawn size 48
Lawnmower - lawn size 64
Lawnmower - lawn size 80
Lawnmower-lawn size96
Bumblebee - 10 flowers
Bumblebee - L5 flowers
Bumblebee-20 flowers
Bumblebee - 25 flowers
Impulse response
Artificial ant
Obstacle-avoiding robot
Minesweeper
Chapter 4
5.7.2
5.2.2
5.3.1
5.4.2
5.1.1
5.2.7
5.3.2
5.4.1.
6.3,6.10
6.4,6.11
6.5,6.12
6.6,6.13
7.4.2
7.4.3
7-4.4
7.4.5
7.4.6
7.4.7
7.4.8
7.4.9
7.4.I0
7.4.1r
7.4.12
7.4.13
7.4.1,4
7.4.1,5
7.4.16
8.4,8.10
8.5,8.1"1
8.3,8.9
8.6,8.72
8.7,8.13
9.10,9,11
9.8,9.9
9.6,9.7
9.3,9.5
Chapter 11
Chapter L2
Chapter L3
Chapter 14
0.53
L.07
0.80
1.09
0.82
0.98
1,.82
1.30
0.92
0.92
1.87
1,.9-J,
1.j7
3.&
2.51,
1.81
3.01
1..97
7.32
2.5r
1j0
1.11
2.29
1.38
0.77
2.07
1..2r
0.69
2.19
3.15
3.65
4.65
5.06
1.49
1.72
112
1.84
1.81
7.27
2.71
2.86
0.53
0.33
0.56
0.89
0.38
7.22
1,.82
r.28
r.32
1.50
2.18
14.07
52.20
5.44
4.25
2.76
6.00
4.08
2.49
4.29
6.00
2.43
4.53
3.88
1,.79
4.53
3.18
2.22
3.80
6.22
9.09
33.00
234.60
1.20
7.21,
1..24
3.20
1.46
2.00
3.27
6.87
645 Conclusion
Chapters 21 through 25 provide evidence in support of main point 8 which
is as follows:
Main point 8: Genetic programming is capable of simultaneously solving
a problem and evolving the architecture of the overall program.
The problems of chapters L5 through 20 were so time-consuming that we
were unable to make enough successful runs to make meaningful performance curyes (and therefore do not appear in table 27.1). Nonetheless, the
results of experiments with these problems provide additional evidence supporting the main points of this book. For the letter-recognition problem (chapter 15), the three-way classification of flushes and fours-of-a-kind (chapter
L6), the set-creating and arithmetic-performing versions of the transmembrane
problem (chapter 18), the omega-loop problem (chapter 19), and the lookahead
version of the transmembrane problem (chap ter Z}),there was evidence based
on a small number of runs that automatically defined functions do indeed
facilitate solution to these problems. The evidence took the form that the problem was solved with apparent ease when automatically defined functions
were used, but only rarely when they were not used. In other cases, genetic
prograiluning was only able to come close to a solution when automatically
defined functions were not used, but was able to solve the problem one or
more times when they were used.
Finally, when the architecture was evolved in chapters 21 through 25,we
repeatedly saw that architectures employing automatically defined functions
consistent$ won the competitive battle within the population. This fact further supports the proposition that automatically defined functions are generally beneficial in genetic progamming.
tr summary, the evidence from this book supports the proposition that
automatically defined functions shouldbecome a standard partof the genetic
Programmer's toolkit. Automatically defined functions work so well for so
many different problems that anyone using genetic programming should also
fry automatically defined functions on their problem.
646 Chapter 27
Appendix A: List of Special Sfnttbols
Table A.1 shows the definition and a reference (chapter or section) for each of
the special symbols defined and used in multiple places in this book.
Thble A.1 Special syrnbols.
Symbol Definition Reference
ADFO
C
E
Ewrth
Ewithout
G
fadfl
Frpb
Automatically defined function 0 (the
function defined by the first function-defining
branch of an overall program).
Correlation
Computational effort as measured by the
minimum value of I(M,i, e) over all the
generations between 0 and G. "E
= I(M,i* ,1) = (i. + I)MR(z).
E is one of the two numbers aPPeari.g it
the oval of the performance curves.
Computational effort with automatically
defined functions
Computational effort without automatically
defined functions
Maximum number of generations to be run
Function set for automatically defined
function ADF0.
Function set for the result-producing
r 1 ---
DTANCN RPB.
Function set
Current generation number
The best generation number (i.e., the number
of the first generation for which the minimum
value of I(M,i, z) is achieved). ix is one of
the two numbers appearing in the oval of
the performance curves.
Total number of individuals that must be
processed to yield a solution (or satisfactory
result) by generation i with probability z using a
population of. size M
4.6
16.2,18.5.2
4.11.
4.11,
4.11,
2.1,
4.6
4.6
4.2
2.1,
4.11,
r
i
i*
I(M,i,z) 4.17
Symbol Definition Reference
rPBO Iteration-performing branch. 1g.4
rTB0 Iteration-terminating branch. 20.2
K Number of characters in alphabet 2.1,
L Length of string 2.1
M Population size 2.1.
NrL The Boolean constant denoting false Z.z
P(M,i) Cumulative probability of success by 4.I1,
generation i with population size M
R(e) Thevalue of R(M,i,z) fotthebest 4.11
generation l*
R(M,i,z) Number of independent runs required to 4.Il
yield a solution (or satisfactory result) by
generation i, for a population size of M,with
a probability of z. R(M,i,z) iscomputed
from P(M, i ) and z.
RE Efficiency ratio between the value of 4.I1,
Ewithout without automatically defined
functions to the value of. E*ro with ADFs
Rs Structural complexity ratio befween the 4.10
value of Srithort without automatically
defined functions to the value of S*i,i
with ADFs
Rw Wallclock ratio between the value of 8.16
W without without automatically defined
functions to the value of W*,,7 withADFs
frbigger-reals Floating-point random constants ranging 11,.2
between -10.000 and +10.000 (with a
granularity of 0.001)
SBoolean Random Boolean constants (r or Nrl) 22.L
frreals Floating-point random constants ranging 5.1.1.1
between -1.000 and +1.000 (with a
granularity of 0.001)
9treal-vecto, Vector random constants ranging 9.2
between -5.0000 and +5.0000 with
floating-point numbers as components.
St"*ury Temary random constants from the set
{T, NrL, : UNDEFTNED} 24.2
frvg Vector random constants ranging
between (0, 0 ) and (7, 7 ) with integers
modulo 8 as components 8.2
RPB Result-producing branch (the last branch 4.6
of an overall program).
S Average structural complexity (number
of functions and terminals) in a set of
programs (usually the set of successful runs) 4.10
S.n Average structural complexity withADFs 4.10
548 Appendix A
Symbol Definition Reference
Swithout
T
(f
%dfr
ttpb
:UNDEFINED
w(M,i,z)
Wwith
W*ithout
Y(M,i)
Average structural complexity without ADFs
The Boolean constant denoting true
Terminal set
Terminal set for automatically defined
ftrnction ADF0.
Terminal set for the result-producing
branch Rpe.
The ternary constant denoting an
undefined value
Average elapsed wallcock time in order to
yield a solution (or satisfactory result) by
generation i,f.or apopulation size, M,
with a probability of e.
Wallclock time with ADFs
Wallclock time without ADFs
Observed instantaneous probability
that a run yields in a population of size M ,
for the first time, at least one program that
satisfies the success predicate of the
problem oru generation i.
Probability threshold desired for finding
at least one successful run in a series of run.
z is99"/" throughout this book.
"Dort'tcare" symbol in a schema or in rule
of a genetic classifier system
4.10
2.2
4.2
4.6
4.6
24.1.
8.16
8.16
8.16
4.I1.
4.r1,
2.1,,4.5
&9 AppendixA
Appendix B: List of Special Functions
Table 8.1 shows the name, number of arguments, and a reference for certain
special functions used in this book.
Thble B.1 Special functions used in this book.
Name Number of Reference
argument
z
I F
IFGTZ
rFLTE
EXPP
ORN
Protected division
If
If Greater Than Zero
If Less Than or Equal
Protected exponentiation
Numerically valued disjunction (OR)
4.2,11..2
15.2
18.10
1L.2,18.5.1
L1,.2
18.5.1
Appendix C: List of Fonts
Table C.1 shows the usage of each of the type fonts used in this book.
Thble C.l Fonts used in this book.
Example Usage of Font
ADFO
A, C, G, T,U
A,C,D,...
H, C,0, N, S
N,C
Parts of computer programs
Nucleiotide bases in DNA or RNA
The 20 amino acid residues used in proteins
Chemical elements: Hydrogen, carbon, oxygen, nitrogen, sulfur
The N-terminal end and the C-terminal end of a protein
Appendix D: Default Parameters for
Controlling Runs of Genetic Programming
Runs of genetic program*i^g are controlled by 27 control parameters,
including two major numerical parameters and L9 minor parameters. The L9
minor parameters consist of L1 numerical parameters and eight qualitative
variables that control various altemative ways of executing a run. Trvo of the
minor variables discussed below were not included in the list of parameters
in Geneflc Programming (section 6.9) and are new to this volume.
Except as otherwise specifically indicated, the values of all 2L control
parameters are fixed at the default values specified below throughout this
book. The default values are used in the vast majority of cases.
The two major numerical parameters are the population size, M, and the
maximum number of generations to be run, G.
. The default population size, M,is 4,000. (Populations of 1,000, 8,000, or L6,000
are used for certain problems herein).
. The default value for the mafmum number of generations to be run, G, is
5L (an initial random generatior; called generation 0, plus 50 subsequent
generations). (A value of 21. is used occasionally).
Because of their importance, these two major parameters are explicitly mentioned in the tableau of every problem even when the default values are
being used.
M*y of the t9 minor parameters are direct analogs of parameters that are
used in corurection with the conventional genetic algorithm; some are specific to genetic programming.
We have intentionally made the same choices for the default values for the
various minor parameters as in Genetic Programming (section 6.9) with two
exceptions. One exception is that the selection method is tournament selection (with a group size of seven) as opposed to fitress-proportionate reproduction for every run herein (except those in section 5.1,6). The second exception
concems the method of randomization of fihress cases (a variable that was
not specifically identified as a control variable in Geneflc Programming).
Thble D.L Default values of the 2L control parameters for genetic programming.
TWo major parameters
. Population size M = 4,000.
o Maximum number of generations to be run G = 51.
Eleven minor numerical parameters
. ProbablJri$ p, of crossover =90o/".
o ProbabfrW p, of reproduction = 10"/o.
. Probabih$ pip of choosing intemal points for crossover =90o/o.
o Maximum size Drrro,rs for programs created during the run -LT.
o Maximum size D,ru,o,for initial random progams = 6.
o ProbabfrW p^of mutation = 0.07o.
o Probabihf pp of permutation =0.0o/".
o Frequency fed of editing = Q.
. Probabll$ prnof encapsulation = 0.0%.
o Condition for decimation = NIL.
r Decimation target percentage pa = 0.0"/".
Eieht minor qualitative variables
. The generative method for initial random population is ramped half-andhalf.
o The basic selection method is toumament selection with a group size of
seven.
. Spousal selection method is toumament selection with a group size of seven.
. Adjusted fitress is not used.
o Over-selection is not used.
. The elitist strategy is not used.
. The randomizatiory if *y, involved in the creation of the fitness cases for a
problem is fixed for all runs of the problem.
o In structure-preserving crossoveq, the way of assigning types to the
noninvariant points of a program is branch typing.
The eleven minor numerical parameters used to control the process are
described below:
. The probability of crossove\ pc, is 0.90. That is, crossover is performed
such that the number of individuals produced as offspring by the crossover operation is equal to 90% of the population size on each generation. For example, if the population size is 16,000, then L4,400 individuals are produced as offspring by the crossover operation on each
generation.
. The probability of reproductiory p' is 0.10. That is, for each generatiorl
reproduction is performed on a number of individuals equal to L0% of the
population size. For example, if the population size is 16,000, 1,600 individuals are selected (with reselection allowed) to participate in reproduction on each generation.
. In choosing crossover points, we use a probability distribution that
allocates p ep = 90% of the crossover points equally among the intemal points
of each tree and Pep = I - pip = I0"/o of the crossover points equally among
the extemal points of each tree (i.e., the terminals). The choice of crossover
points are further restricted so that if the root of any branch is chosen as the
point of insertion for a parent, then the crossover point of the other parent
may not be merely a terminal.
. Amaximum size (measured by depth) , Dcreated,isLT for programs created
by the crossover operation for all mns not using the array method of representation. (The affay method is described below; it is used only for the 3-,
4-,5-, and 6-parity problems in chapter 6 and the comparative study of the
L5 ardritectures of the even-S-parity problem of chapter n.V a particular
offspring created by crossover exceeds the applicable limit, the crossover is
aborted as to that particular offspring. If offspring 1 is unacceptable, parent
L becomes offspring 1. Similarly, rt offspring 2 is unacceptable, parent 2
becomes offspring 2 .
. Amaximum size (measured by depth), Drnirror, is6 for the random individuals generated for the initial population.
. The probability of mutation" p* speci$ng the frequency of performing
the operation of mutation is 0.
. The probabili$ of permutatiorr, pp, specifying the probabilif of performing the operation of permutation is 0.
. The parameter specifying the frequency, fed, of applymg the operation of
editing is 0.
. The probability of encapsulatton,prn, specifying the probability of performing the operation of encapsulation is 0.
' The condition for invoking the decimation operation is set to NIL . That is,
decimation is not used.
' The decimation percentage, p4 (which is irrelevant if the condition for
invoking the decimation operation is Nrl) is arbitrarily set to 0.
657 Default Parameters for Controlling Runs of Genetic Programming
Eightminor qualitative variables control theway a runof geneticprogramming is executed. The first six of these variables were included in Genetic
Programming (section 6.9); thelast two are new to this volume.
' The generative method for the initial random population is ramped halfand-half .
' The method of selection for reproduction and for the first parent in crossover is toumament selection (with a group size of seven). This choice differs from Genetic Programming where the method of selection was
fitness-proportionate reproduction (and where greedy over-selection was
used for larger population sizes). lntournament selection, aspecrhed group
of individuals is chosen with a uniform random probability distribution
from the population and the one with the best fibress (i.e., the lowest standardized fitress)is then selected (Goldberg and Deb 1991).If two individuals are to be selected (say, for participation in crossover), a second group of
the specified size is chosen at random and the one with the best fibress is
selected. Toumament selection with a group size of two is illustrated when
two bulls fight over the right to mate with a gr.ven cow. We use a group size
of seven for toumament selection because it lessens the probability that the
current best-of-generation individual will not be selected to participate in
atleastone operation. All individuals remain in the populationwhile selection is performed for the entire current generation. That is, the selection is
always done with replacement (i.e., reselection). Toumament selection is
used throughout this book, except for the runs in section 6.16 which were
done prior to our decision to switch to toumament selection. Therefore, if
the goal is to replicate the results reported in this book, toumament selection should be used as indicated. In retrospect, it is not clear that the decision to use tournament selection was beneficial, so we do not necessarily
recommend this choice for future work.
' The method of selecting the second parent for a crossover is the same as the
methodfor selecting thefirstparent (i.e., toumament selectionwith a group
size of seven).
' The optional adjusted fifiress measure (usually used in Genetic Programming) is irrelevant in the context of tournament selection.
' The technique of greedy over-selection (used tn Genetic Programming for
certain population sizes) is irrelevant in the context of toumament selection.
' The elitist strategy is not used.
The last two of these eight minor variables were not included in the list of
parameterc tn Genetic Programming (section 6.9).
' If there is any randomization involved in the creation of the fitness cases for
the problem, the randomization occurs once and is fixed for all runs of the
problem. The altematives to this default choice are to randomize the fibress
cases anew at the begiruring of each run (used in tables 4.2 and 5.15), to
randomize the fitness cases ernew from generation to generationwithin the
658 Appendix D
run (not used at all in this book), and to randomize the fihess cases anew
for each fibress evaluation (not used at all in this book).
. The way of assigning types to the noninvariant points of an overall prog1am is branch Vping. The altematives to this default choice are point typing (used in chapte rc 21, through 25) and like-branch typing (not used in
this book, but considered in section 25.n of Genetic Programming)'
The 19 minor parameters are generally not specifically mentioned in the
tableau unless there is deviation from the default value. However, because
automatically defined functions are central to this book, we do explicitly
mention the choice of the way of assigning types to noninvariant points in
each tableau with ADFs even when the default value (i.e., branch typing) is
being used.
Note that the default value of L7 for the maximum permissible depth,
Drrrot"d, for a program created by crossover is not a significant or relevant
constraint on program size. In fact, this choice permits potentially enormous
programs. For example, the largest permissible I ISP program consisting of
e"tir"ty two-argument firnctions would contain 217 = I3l,}7lfunctions and
terminals. If four LISP ftrnctions and terminals are roughly equivalent to one
line of a program written in a conventional programming language, then this
largest permissible program corresponds to about 33,000lines of code.
We do not use LISP S-expressions to represent the programs in the populations for three purposes in this book: when the population size is 16,000 (as it
is only for the even-3-, 4-,5-,and 6-parity problems in chapter 6), for the runs
of the even-S-parity problem with the 15 different architectures (chaptet 7),
and for the runs of 1,,000,000 programs in chapter 26. Instead, we use the array
methodof representation for programs in which the tree structure of individual
programs in the population is represented as a table. With the array method,
the size limit for the random individuals generated for the initial random
population is expressed in terms of the total number of points, rather than in
terms of the depth of the tree. (This is in contrast to our usual practice throughout this book, where the limits on program size are applied separately to each
branch of an overall program). For example, for an overall program consisting of two function-defining branches and one result-producing branch, there
is an overall limit of 500 points for all the branches, a separate limit of 200 on
the result-producing branch, and a separate limit of L50 on each functiondefining branch. These same limits are imposed on each potential offspring
of the crossover operation. If a particular offspring created by crossover exceeds the applicable limit, the crossover is aborted as to that particular offspring.If offspring 1 is unacceptable, parent l becomes offspring L. Similarly,
if offspring?rs unacceptable, parent 2 becomes offspring 2. \n/hen these size
limits are applied separately to each branch, the average size of programs in
generation 0 with automatically defined functions are much larger (by a multiple approximately equal to the total number of branches in the overall program) than the average size without automatically defined functions.
Table D.L summarizes the default values used in this book for the
659 Default Parameters for Controlling Runs of Genetic Programming
numerical parameters and qualitative variables for controlling runs of
genetic programming.
M*y problems described herein undoubtedly could be solved better or
faster by means of different choices of these parameters and variables. No
detailed studies of the optimal choice for the controlparameters, withorwithout automatically defined functions, have been made. Instead, the focus in
this book is on the demonstration of the main points stated in chapter 1. In
my view, the optimal choices for the control parameters become relevant only
after one has been persuaded of the basic usefulness of genetic programming
with automatically defined functions. hr the present volume, this process
of persuasion would be undermined by frequent variation of the various
control parameters; the reader might come to attribute any demonstrated success of automatically defined functions to the fortuitous choice of the parameters. Of course, parameters are occasionally changed for certain specific
reasons for illustrative pu{poses, for historical reasons, and when necessary.
Appendix D
Appendix E: Computer Implementation
of ADFs
In order to further explore the potential of genetic programming with automatically defined functions and to replicate the experimental results reported
herein, it is necessary to implement genetic programmingwith automatically
defined functions on a computer.
Common LISP code for implementing genetic programming appears in
appendixes B and C of Genetic Programming (Koza 1992a). That code and the
code in this appendix (along with such updates as may from time to time be
added) can be obtained on-line via anonymous FTP (file transfer protocol)
from the pub / genet i c -programmingr directory at the FTP site
f tp.cc . utexas . edu as described in appendix C.
Automatically defined functions can be implemented by modifying the
code in appendixes B and C of Gsnetic Programming rn light of the following
five considerations.
First, since each overall program in the population consists of one or more
function-definingbranches aswell as a result-producingbranch, a constrained
syrtactic strucfure mustbe created to accommodate the multi-branch overall
program.
Second, the terminal and function sets differ among the branches. One difference is that the function set of the result-producingbranch contains at least
one automatically defined function, whereas at least one of the function-defining branches does not refer to any automatically defined function. Another
difference is that there are no dummy variables in the result-producing branch.
It is frequently (but not necessarily) true that the terminal set of the functiondefining branches contains dummy variables (formal parameters), although
the artificial ant problem of chapter 12 using side-effecting functions illustrates that function-defining branches do not necessarily have any dummy
variables. It is also frequently (but not necessarily) true that the terminal set of
the function-defining branches does not contain any of the actual variables of
the problem. The terminal set of the result-producing branch frequently (but
not necessarily) contains the actual variables of the problem.
Third, generation 0 of the population must be created in conformity with
the desired constrained syntactic structure. Specifically, each branch of each
overall program in the population must be composed of the functions and
terminals appropriate to that branch.
Fourth, crossover must be performed so as to preserve the syntactic validity of all offspring. Crossover is limited to the work-performing bodies of the
various branches. Strucfure-preserving crossover is implemented by allowitg *y point in the work-performing body of any branch of the overall proSram tobe chosen, withoutrestrictiory as the crossoverpoint of the firstparent.
Once the crossover point of the first parent has been chosen in strucfure-preserving crossover, the choice of the crossover point of the second parent is
restricted to points of the same type. Types are assigned to the noninvariant
points of an overall program in one of three ways (branch Wng,point typing, and like-branch fyping) described in section 4.8.
Fifth, when the result-producing branch is being evaluated, it must be able
to invoke the appropriate automatically defined functions within the overall
Program.
This appendix contains Common LISP code (Steele 1990) for a simple version of genetic Programming with automatically defined functions. This code
is based as closely as possible on the LISP code from appendixes B and C of
Genetic Programming. Since our experience with that code has been that most
users used the code as a guide to write their own code (often in another programming language), the LISP code in this appendix is written in an intentionally very simple style so that it can be easily understood by a user who
has only minimal knowledge of LISP. The user will find many opportunities
to optimize this code and make it more general and flexible in the process of
using it or translating it to another language. The code is divided into a problem specific part and a problem independent kemel. Implementation of automatically defined functions requires changes to both the problem-specific
part and the kemel. We have tested the code in this appendix on the Texas
Instruments Explorerrrvl II+ computer using its Common LISP environment.
The code in this appendix illustrates the problem of symbolic regression of
the Boolean even-S-parity problem (chapters 6 and 7) using two three-argument automatically defined functions.
In order to run a different problem, the user need only modify a relatively
small amount of code. If the user's new problem involves two three-argument automatically defined functions, only the problem specific part of the
code here need be modified. Techniques for modifying the problem specific
code to handle different problems were illustrated tn Gutetic Programming
(appendix B) with three different problems, so our focus here will be on the
aspects of the code that differ depending on whether or not automatically
defined functions are being used.
The potential user of this code should be alert to the fact that almost all the
problems in this book require considerably more computer resources to run
than a typical problem described tnGutetic Programming.The reason is that
problems that contain a sufficient amount of internal regularity to benefit
from automatically defined functions are inherently more complex. Very
simple problems do not need, and do not benefit from, automatically defined
functions. The most conunon population size in this book is 4,000 versus only
500 in Genetic Programming. The size of the populatiory of course, impacts
Appendix E
both computer time and memory.L:r addition, the number of fibness cases is
also geneially higher in this book because more complex problems usually
require more fitness cases. Moreove{, many of the problems herein use random constants and successful runs with random constants general$ require
larger population sizes. In addition, because we chose to apply the depth
restrictions independently to the body of each branch (for all population sizes
below L6,000), the programs in this book tend to much larger and hence more
demanding of computer time and memory than those without automatically
defined functions. Finally, the interpretation and execution of Programs with
automatically defined functions takes more time thanPrograms without them.
The Texas Ilrstruments ExplorerrM II+ computer thatwe used to run all the
problems of this book was of late 1980s vintage. Except for the simple problems in early chapters, a single run of most problems described in this book
took between a half day to several days each on one Processor of this excellent, but now-outdated, machine. Comparing machines is always uncertain.
Comparisons are especially uncertain when one machine is a LISP machine
and the other is not. The overall performance of our machine is, roughly,
comparable to a Sun IPXrM when running a corunercial software version of
LISP.
The user should also keep in mind the fact that a sufficient population size
is absolutely essential in genetic methods. Genetic methods start to perform
only when a sufficient population size with a sufficient variety of genetic
material is available. If an insufficient population size is used, virtually no
results are produced.
E.1, PROBLEM SPECIFIC CODE FOR BOOLEAN EVEN-s-PARITY
PROBLEM
Aspreviouslymentioned, there are six major steps inpreparing to use genetic
progamming with automatically defined functions, namely determining
(1) the set of terminals for each branch,
(2) the set of functions for each branch,
(3) the fitness measure,
(4) the parameters and variables for controlling the run,
(5) the method for designating a result and the criterion terminating a run/
and
(6) the architecture of the overall program.
The problem specific part of the LISP code in this appendix closely parallels these six major steps. It is relatively straightforward to adapt the problem
specific part of the code in this appendix to a new problem by visualizrng a
problem in terms of these steps.
The sixth major step is peculiar to automatically defined functions and
should be performed first.
663 Appendix E: Computer Implementation of ADFs
The sixth major step involves determining
(a) the number of function-defining branches,
(b) the number of argunents possessed by each function-defining branch,
and
(c) if there is more than one function-defining branch, the nature of the
hierarchical references (if any) allowed between the function-defining
branches.
For the Boolean even-S-parity problem, the sixth major step, for this example,
consists of deciding that there will be two function-defining branches (for
automatically defined functions ADFO and apr'1); that ADFO and aopt will
each take three arguments; and that the second automatically defined function, ADF1, is permitted to refer hierarchically to the first automatically defined function, ADFO. The fact that there are two function-defining branches
and one result-producing branch in each overall program in the population
meems that the terminal set and the function set must be specified for each of
these three branches.
Having performed the sixth major step, we can proceed to the other five
major steps.
The problem specific part of the LISP code requires writing code for the
following 12 types of items:
(1) defvardeclaration(s),
(2) a grouP of functions whose n€unes begrn de f ine - t. e rmi na I - s e r - f or -
EVEN-5-PARrrv for each function-defining branch and the resultproducing branch of the overall program,
(3) a group of functions whose n€unes begin def ine- f uncrion-ser - f orEVEN-S*PARrrv for each function-defining branch and the resultproducing branch of the overall program,
(4) if applicable, user-defined problem specific function(s),
(5) defstruct EVEN-5-pARITy-fitness-case,
(6) def ine- f itnes s-cases - for-EVEN- 5 -pARITy,
0 EVEN-5-pARrry-wrapper,
(8) evaluate-standardi zed- fi-tness - for-EVEN- 5 - PARITy/
(9) define-parameters - for-EVEN-5 -pARrTy,
(10) def ine-termination-criterion- for-EVEN-5-PARITy,
(11) the function EVEN-5-pARrry, and
(12) the invocation using run - gene r i c -pro g ramming - sy s t em.
The first major step inpreparing to use genetic programming is to identify
the set of terminals and the second major step is to identify the function set.
When automaticaTly defined functions are involved, these two steps mustbe
applied to eachbranch of the overall program. That is, items (2) and (3) require
664 Appendix E
that code be written for each branch of the overall program. For this problem,
each of the three branches is composed of different ingredients.
The terminal set, Trpb, for the result-producing branch consists of the five
actual variables of the problem, namely the five Boolean variables D},DI,D2,
D3, and D4.
{D0, D1, D2, D3, D4 } .
The function set, fryb, of the result-producing branch for this problem will
contain four primitive Boolean functions and two automatically defined functions, ADFO and aoP1.
[anr0, ADF]-, AND, oR, NAND, NoR]
with Eu:r argument map for this function set of
t3,3,2,2,2,2\.
The terminal set, Tadfl, for the first function-defining branch that defines
automatically defined functiory ADF0, consists of three dummy variables.
,Toafo - {ARGO, ARG1, ARG2 }.
The function set, fa4fl, for aop 0 consists of the following set of four primitive Boolean functions:
{AND, oR, NAND, NoR}
with an argument map for this function set of
{2,2,2,21.
The terminal set, todft, for the second function-defining branch defining,
ADFI-, consists of two dummy variables (i.e., is the same as ADFO).
,tadf\ - IARGO, ARG1, ARG2 ].
The function set, fad.f1, for anr'l consists of the set of four primitive Boolean functions and the already-defined function ADF0. That is, the functiondefiningbranch foraDrl is capable of hierarchically calling the already-defined
function ADFO.
fadf t= {ADFO, AND, oR, NAND, NoR}
with Eu:r argument map for this function set of
{3,2,2,,2,2}.
Note that the actual variables of the problem, DO, Dl-, D2, D3, and n4, do
not appear in either function-defining branch of this problem and that the
result-producing branch does not contain any dummy variables, such as ARG O ,
ARG1, and aRcz. Also, note that although we use the names ARGO, ARGI, and
ARG2 for the dummy variables of both ADFO and alp'l, these dummy
variables only have a defined value locally within a particular automatically
defined function.
We start by declaring each variable in the terminal set of the result-producing branch and the function-defining branches as global variables. Thus, the
665 Appendix E: Computer Implementation of ADFs
first of the L2 items that we must write in the problem specific part of the code
consist of the following eight declarations:
(defvar d0 )
(defvar d1)
(defvar d2 )
(defvar d3 )
(defvar d4 )
( r-laFtrsr : ra0 \
\ug!vq! q!vvl
(defvar argl)
/rlafrz:r arn)\
\ sL! v e! qLYI
In addition, we need two additional global variables, *ADFS * and *ADFI*
associated with the two automatically defined functions and definitions
for them.
(defvar *adf0*)
(defun adf0 (arq0 argL arg2l
(eval *adf0*) )
(defvar *adf1*)
(defun adfl- (arg0 argl arg2)
(eval *adf1*) )
We place these declarations and definitions at the beginning of the file
containing the LISP code for this problem.
Since there are multiple branches to each overall program in the population, we now create a LISP function to define the terminal set for each function-defining branch and the result-producing branch of the overall program.
Each overall program here consists of ADFO, ADF1, and one result-producing
branch, RPB. Each of the functions for defining a terminal set returns the list
of the terminals used in a particular branch of the overall program. Thus, the
second group of items in the problem specific part of the LISP code that we
must write consists of three functions for defining the terminal sets of the
three branches of the overall program.
The function for defining the terminal set of the single result-producing
branch, RPB, is as follows:
(defun define-terminal-set-for-EVEN-5-pARITy-RpB ( )
(values '(d4 d3 d2 d1 d0))
)
The function for defining the terminal set of the function-defining branch
ADFO is as follows:
(defun define-terminal-set-for-EVEN-5-PARITY-ADFO ( )
(values ' (arg0 argl arg2) )
)
The function for defining the terminal set of the function-defining branch
ADF1 is as follows:
(defun define-terminal-set-for-EVEN-5-PARITY-ADFI ( )
Appendix E
(values ' (argO argl arg2) \
)
Note that, for clarity, we exPlicitly hightight the value(s) retumed by each
function by using a values form.
The third group of items in the problem specific part of the LISP code that
we mustwrite consists of three functions for specifying the function sets and
the argument maps of the three branches of the overall Program.
The function for defining the function set and the argument map of the
result-producingbranch, RPB, is as follows:
(defun define-function-set-for-EVEN-5-PARITY-RPB ( )
(values '(and or nand nor ADFO ADFI)
,( 2 2 2 2 3 3 )
)
)
The function for defining the function set and the argument map of the
first function-defining branch, ADFO, is as follows:
(defun define-function-set-for-EVEN-5-PARITY-ADF0 ( )
(values '(and or nand nor)
,( 2 2 2 2 )
)
)
Since ADF1 is permitted to refer hierarchically to ADFO, ADFO appears in
the function set of the second function-defining branch, ADF 1. Thus, the function for defining the function set and the argument map of the second function-defining branch, ADFI, is as follows:
(defun define-function-set-for-EVEN-5-PARITY-ADF1 ( )
(values '(and or nand nor ADF0)
'( 2 2 2 2 3 )
)
)
For purposes of programming, we treat a[,zero-argument functions as terminals. Note that, for purposes of exposition in the text of this book, we treat
zero-argament side-effecting functions as terminals, but treat zero-argument
ADFs as functions.
Many of the 12 items that we must write in the problem specific part of the code
for a pncblem when using automatically defined functions are written in mudr
the same way as when automatically defined functions are not being used. We
indude them here for completeness; howeveq, we describe some of them briefly.
The fourth item in the problem specific part of the LISP code that we must
write consists of writing the definition of any problem specific functions (if
any) peculiar to the problem. For this problem, the primitive functions, NAND
and trion, appearing in the function sets of all three branches require definition. The multi-argument oDD-PARITY function (used later to compute the
target even-S-parity function)is also defined here.
667 Appendix E:Computer Implementation of ADFs
(defun NAND (a b)
(not (and a b) )
)
(defun NOR (a b)
(not (or a b) )
)
(defun ODD-PARfTY (&rest arss)
(let ( (result nil) )
(dolist. (value args result)
(when value (setf resulL (not result)) )) )
)
The third maior step in preparing to use genetic progranrming is identifying the fibress measure for evaluating how good a glven computer program
is at solving the problem at hand. The even-S-parity problem is fypical of
most problems in that fibress is computed using a number of fitness cases. We
establish the fibress cases at the beginning of the run. The kemel then loops
over each individual program in the population calling on the user-specified
fitness function to evaluate the fibress of each individual. If the fitness measure requires fitress cases, the fitness function loops over the fihress cases in
order to evaluate the fihress of each particular $expression from the population.
We store the fitness cases in an aftay,each element of which corresponds to
one fibress case. Each fihress case is implemented as a record structure. It is
convenient to store the values of all the independent variables for a given
fibress case in the record for that fihress case along with any dependent variables (the "answer") for that fibress case. Since the Boolean even-S-parity problem is a problem of symbolic regression involving five independent variables
and one dependent variable, there are six variables for this problem.
The fifth item in the problem specific part of the LISP code that we must
write is the def struct record strucfure declaration for the fibress cases of
this problem:
( def struct EVEN- 5 - PARITY- f itness -case
d0
d1
d3
d4
r 2 rdor
)
The sixth item in the problem specific part of the IJSP code that we must write
isthefunctioncalledOetine-f itness-cases-f or-s.rEN-5-pARrrvforthisprob
lem. The fitress cases for this pncblem consist of all 25 =32possible combinations
of the five Boolean arguments, dO, dL, d2, d3, and d4, so the *number-of -
f itness-cases* is32. These fifiress c;rses are created with five nested dolisr
functions, each looping over the list ( t ni t ) . Maximum raw fitress is 32 matdres.
Appendix E
Standardized fitress is 32 minus raw fitress. The tarset is defined by using the
negation of the multi-argument oDD-pARnv function.
(defun def ine-fiLness-cases-for-EVEN-5-PARITY ( )
(let (fitness-case fitness-cases index)
(setf fitness-cases (make-array *number-of-fit.ness-cases*) )
(format t "-SFitness cases")
(set.f index 0)
(dolist (d4 '(t nil))
(dolist (d3 ' (t nil) )
(dolist (d2 '(t nil))
(dolist (dl- ' (t ni1) )
(dolist (d0 '(t nil))
(setf fitness-case
( make- EVEN- 5 - PAR] TY- f i tnes s - case )
)
(setf (EVENI-5-PARITY-fitness-case-dO fitness-case)
d0)
(setf (EVN-5-PARIW-fitness-case-dl f icness-case)
d1)
(setf (EVH\I-5-PARITY-f itness-case-d2 fitness-case)
d2)
(setf (EVn\-5-PARITY-fitness-case-d3 fitness-case)
d3)
( setf (EVENI-5-PARIW-f itness-case-d4 f itness-case)
d4)
( set f ( EVEN- 5 - PARITY- f itness-case- target
fitness-case )
(not (ODD-PARITY d4 d3 d2 d1 d0))
)
(setf (aref fitness-cases index) fitness-case)
(incf index)
(format t
o-z -3D -10s-10s--10s^-l0s-1-0s-l-5s"
index d4 d3 d2 d1 d0
( EVEN- 5 -PARITY- fitness -case-target
fitness-case
)
)
(values fitness-cases)
l
The seventh item in the problem specific part of the code that we must
write for this problem is the function evEN- 5 - pARr ry-wrapper. hr this problem,
669 Appendix E: Computer Implementation of ADFs
(defun
( let
the wrapPer (output interface) merely retums what it is produced by the
result-producing branch of the program, namely resu]r - f rom-program.
(defun EVEN-5-PARfTy-wrapper (result-from-program)
(values result - from-proqram)
)
The eighth item in the problem specific part of the LISP code that we must
write is the function called eva I uat e - s t andardi z ed - f i tnes s - f or -nvnu5 - PARrrY. This function receives two arguments from the kemel, namely the
individual computer program from the population which is to be evaluated
(called prosram) and the set of fitness cases (called f irness-cases). This
function retums two values, namely the stand arduedfihress of the individuals and the number of hits. Note that prior to the evaluation (via eval) of the
result-producing branch of progran; it is necessary to set each of the five
independent variables of this problem (represented by the global variables
d0, dl, d2, d3, and o+ ). The Boolean flag mat ch - f ound is defined as a result of
testing value- f rom-progrram for equaliV (i.e., eq) with targret-value.
evaluate- standardi zed- f itnes s - for-EVEN- 5 - PARITy
tf nrnar-- €.i +-^^^ ^-^^^ \
\yr ugr cuU. L_L Lrrc5tt-udtjes,
(raw-fitness hits standardized-fitness target -value
match-found value-from-program fitness-case rpb
(setf raw-fitness 0.0 )
(secf hits 0 )
(setf rpb (ADF-program-RpB program) )
(setf *adf0* (ADF-program-ADF0 program) )
(setf *adf1* (ADF-program-ADF1 program) )
(dotimes (index *number-of-fitness-cases* )
(setf fitness-case (aref fitness-cases index) )
(setf d0 (EVEN-5-pARrry-fitness-case-dO fitness-case) )
(setf d1 (EVEN-5-pARrry-fitness-case-d1 fitness-case) )
(set.f d2 (EVEN-5-pARITy-fitness-case-d2 fitness-case) )
(set.f d3 (EVEN-5-pARrry-fitness-case-d3 fitness-case) )
(seLf d4 (EVEN-5-PARrry-fitness-case-d4 fitness-case) )
(set.f target-value
( EVEN- 5 -PARITY- f itnes s -case- target f itness -case ) )
rz:lrra-frnm-^-. I-^ Jgram
(EVEN-5-PARITY-wrapper (eval rpb) ) )
maf f'h-fnifnf] (an |'ard^+- ---1,,^ rlrsuvrr \rAltro-f rnm-n-^*---\ \ lvqrrv \c\{ Lar9cL-vcllu(' pf OgId.IIl/,
raw-fitness (if match-found 1.0 0.0) )
match-found (incf hits) )
(set.f standardized-fitness (- 32 raw-fitness) )
(values standardtzed-fitness hits )
)
Except for very simple problems, the bulk of computer time is consumed
during the execution of the evaluate-standardized-f itness-f or*EVENAppendix E
I^^IE
/ cot- f
( incf
(when
670
s-pARrry function. Thus, the user should focus his optimization efforts on
the fihress measure and any functions that may be called when a Program
from the population is measured for fitness'
For Boolean problems, the user can Save an enormous amount of computer
time with one of two possible optimization techniques. One technique involves identifying the particular three-argument Boolean functions that are
performed bythebodies of enpg and anrl; creating two eight-row lookup
tables for ADFO and anpt-; and thereafter using the lookup tables in lieu of
evaluating the entire bodies ADFO and anr'1 for each fitness case. A second
techniqueinvolves converting the Boolean expressions inADF0 and anpl to
disjunctive normal form (DNF) and compili.g the resulting Pro$am. Both
techniques are especially valuable when there is one or more hierarchical reference between the function-defining branches because the hierarchical reference is, in effect, eliminated.
The fourth major step in preparing to use genetic Programming is determining the values of certain parameters for controlling the run.
The ninth item in the problem specific part of the code that we must write
is the def ine-parameters-f or-EVEN-5-PARITY function. This function is
used to assign the values to ten parameters that control the run.
(defun define-parameters-for-EVEN-5-PARITY ( )
( setf *number-of-fitness-cases* 32)
(setf *max-depth-for-new-individuals* 5)
( set f *max-depth- for-new- subtrees - in-mutant s * 4)
( set f *max-d,epth- for- individual s -after- crossover* L7 )
(setf *reproduction-fraction* 0. 1)
(setf *crossover-at-any-point-fraction* 0 .2)
( set f * cros sover-at - funct ion-point - fract ion* 0 . 7 )
(setf *method-of-selection* : tournament)
(setf *tournament-size* 1)
( set f *method-of -generat ion* : ramped-hal f -and-hal f )
(values )
)
The *number-of -f itness-cases*, which depends on the problem, is set
in the second line. The remaining lines contain the values of the numerical
parameters and the qualitative parameters for controlling the run shown as
default values in appendix D. The *tournament-size* is set to 7 here.
Finally, the fifth major step in preparing to use genetic programmittg itvolves determining the criterion for terminating a run and the method for
designating the result of a run.
The tenth item in the problem specific part of the code is the def inet erminat. i on - c r i t e r i on- f or - EVEN - 5 - PARI Ty function.
( de f un de f ine - t erminat i on- c r i t erion - f or - EVEN- 5 - PARI TY
( current -qenerat ion
maximum-generations
best - standardi zed- f itness
best-hits )
671 Appendix E: Computer Implementation of ADFs
(declare ( ignore best-standardized-fitness) )
(values (or 1)= cLlrrent-generation maximum-generations)
(>= best-hits *number-of-fitness*cases* )
)
)
)
The eleventh item in the problem specific part of the LISP cod.e that we
must write is a function called EVEN- 5 - panrry which informs the kemel about
the various ftrnctions we have just written for this problem. The n€une of this
function establishes the n€une of the problem.
(defun EVEN-S-PARITY ( )
(values 'define- funct ion-set - for-EVEN- 5 -PARTTY-ADF0
' de f ine - f unc t i on - s e t - f o r - EVEN- 5 - PARI Ty -ADF t
' de f ine - f unc t i on- s et - f or - EVEN- 5 - PARI Ty - RpB
' de f ine - t ermina I - s et - f or - EVEN- 5 - pARITy -ADF 0
' de f ine - t erminal - s et - f or - EVEN- 5 - PARITY-ADF 1
' de f ine - t ermi na 1 - s e t - f o r - EVEN- 5 - pARf Ty - RpB
' def ine- f itness -cases- f or-EVEN- 5 - pARITy
'evaluate-standardized- fitness - for-EVEN- 5 -pARITy
' de f ine -parameters - f or- EVEN- 5 - PARITy
' de f ine - t erminat i on- cri t eri on- f or - EVEN- 5 - pARf Ty
)
)
We now illustrate a run of genetic pro$amming by calling a function called
run - genet i c -programming - sy s t em. This function takes four mandatory
arguments, namely
(1) the name of the problem (e.9., EVEN-s-pARrry),
(2) the randomizer seed (which should be greater than 0.0 and less than or
equal to L.0),
(3) the maximum number G of generations to be run, and
(4) the population size M.
Thus, the twelfttrand final itemin theproblem specific part of the code that
we must write is the one line required to execute this problem by invoking
the function run- genet ic -programming- sysLem, with four mandatory
arguments as follows:
(run-genetic-programrning-system'EVEN-5-PARITY 1. 0 51 4000)
Evaluation of the above would result in a run of the EVEN-5-pARrry problem, using the randomizer seed of L.0 with a maximum number G of generations of 51 (i.e., generation 0 plus 50 additional generations) with a population
size, M,of 4,000.
The randomizer seed is an explicit argument to this function in order to
grve the user direct control over the randomizer. By re-using a seed, the user
can obtain the same results (e.g.,for debugging or so that interesting runs can
672 Appendix E
be replicated). By using different seeds on different runs, the user will obtain
different results.
After the above four mandatory arguments, this function can take up to M
additional optional arguments. Each optional argument represents a primed
individual that will be seeded into the initial population. If fewer than M such
primed individuals are provided, the initial population will contain all the
primed individuals that are provided and will then be filled out with randomly created individuals.
One useful test that the user cern perform is to verify that the correct fitness
is computed for a single primed individual consisting of a correct program
for the even-S-parity function.
( run- genet i c -programming- sys t em
,EVEN_5-PARITY 1-.0 I 1,
(make-ADF-proqram
:adfO '(or (and arg0 argl)
/ --.r /*-*,i -rg0 \dlrLr arg0 )
lffi ]rsr arsl)))
:adf I ' (nand (or (and argr0 argl)
(and (nand argO arg0) (nand argl argl)))
(or (and argO argrl-)
(and (nand arg0 argO )
(nand arq1 argl))))
:rpb '(adf1 (adf 0 (adf 0 d0 dl- d0) (adf 0 d2 d3 d0) d0)
d4 d0)))
ADFO here is equivalent to the even-2-parity and ADF1 is equivalent to the
odd-2-parity.
The user can verify the correct operation of his program by running this
problem a number of times.
We have verified the computer code in this appendix by comparing its
operation with our computer code on our Texas Instruments Explorer II+
computer. We made 32 runs of the even-S-parity problem using the computer
code in this appendix.
Figure E.L shows the performance curves generated from these 32 runs
with the computer code contained in this appendix for the even-S-parrty problem with two three-argument automatically defined functions. The populat;ron size, M, of 4,000. The cumulative probability of success is 62% at generation
L9 and 78% atgeneration 50. The numbers 19 and 400,000 in the oval indicate
that, if this problem is run through to generation19, processing a total of
Ewith = 400,000 individuals (i.e.,4,000 x 20 generations x 5 runs) is sufficient
to yield a solution to this problem with 99"/.probability.
Figure 7.6 summarizes the results of 96 runs of this problem with our computer code on our Texas hrstruments Explorer II+ computer. A comparison of
figures 7.6 and E.1 indicates that the rising cumulative probability curve is
virtually the same. Moreover, figure 7.6 rcpofis that the computational effort,
Ewith, for the 96 runs is also 400,000.
Appendix E: Computer Implementation of ADFs
- 6
a
0
q)
I
I
-
J
(t)
q-Jo)u
tha
.-
-
-
R
-
L
A
-
With Defined Functions
(50,787o)
2,500,000
(6,3Vo) 25
Generation
Figure E.1 Performance curves generated from 32 runs using the computer code in this
appendix for the even-S-parity problem showing that Er;,n = 400,000 with ADFs having a
fixed argument map of {3, 3}.
8.2 KERNEL
The kemel is the generic part of the simple IJSP code for genetic programming. kr
this appendix, webrieflyprovide an overview of how the kemel worls and some
basic information to the user who may want to modify the kemel.
The discussion of the kemel is divided into 12 parts.
First, the kemel contains a def struct declaration to declare the data structure representing each individual in the population. The def srrucr form in
LISP is similar to declarations of record types in other programming languages.
The prosram slot in this record type is the individual in the population. There
are four additional slots in this record type, namely for the srandardizedfitness, adjusted-fitness, normaLtzed-fitness, and hits of the
individual prosram in question.
(defstruct individual
program
( standardized- fitness 0 )
(adjusted-fitness 0 )
(norma]ized-f itness 0 )
(hits 0 ) )
The following is a record structure declaration for the programs. The print
method below allows the user to print out a prosram in the form used throughout this book.
( defstruct.
i/ rrlf -nr^arrn
\ qs! },- -:j- *rn
( :print-function
(lambda (instance stream depth)
5,000,000
q)
0
a
q)
9
lr A ,
-
o)
J.)
CN
E
Ft
T
.-
. I
H
F
-
674 Appendix E
(declare (ignore dePth) )
(format stream
tt tnrann Idofrtn ADF0 (ARGO \Prvvrr \ue! ARG1 ARG2) ^,
rn^a nnAa \
AI((JI }\rt\JZ ,| -
* %
-z
-z
-9o
(values -S) )^-
(defun ADFO (ARGO
(values -S) ) -
(values -S))"
( adf -program-adf 0 instance )
(adf -program-adf I inst.ance )
(adf-program-rpb instance) ) ) ) )
adf0
adfl
rpb)
Second., the kemel contains ten defvar declarations for 10 global variables
and binds each of them to : unbound. These are the L0 parameters that the
user is expected to set in the de f ine -parame t e r s - f o r - << * t function described
in the previous section.
(defvar *number-of-.fitness-cases* :unbound
"The number of fitness cases")
(d.efvar *max-depth-for-new-individuals* :unbound
"The maximum depth for individuals of the initial
random generation")
( de fvar *max-depth- f or- individual s -af L er- cros Sover * : unlcound
,,The maximum depth of new individuals created by crossover")
(dofrr:r *renroduction-fraction*
\vv! vsr :unbound
,,The fract.ion of the population that will experience fitness
proportionate reproduction (with reselection)
drrri nn oanh arpnFr^l- i nn " )
su! rrrv I
(defvar *crossover-at-any-point-fraction* :unbound
"The fraction of the population that will experi-ence
crossover at any point in the tree (including terminals)
during each generation" )
(defvar *crossover-at-function-point-fraction* :unbound
"The fraction of the population that will experience
crossover at a function (internal) point in the tree
during each generation. ")
(defvar *max-depth-for-new-subtrees-in-mutanLs* : unbound
"The maximum depth of new subtrees created by mut.ation")
(defvar *method-of-selecLion* : unbound
"The method of selecting individuals in the
Either : fitness-proportionate, :tournament
: f i tnes s -proport i onat e-wi th-over- se1 ect i on .
(defvar *tournament-size* : unbound
"The group size Lo use when doing tournament
population.
or
I
selection. " )
675 Appendix E: Computer Implementation of ADFs
(defvar *method-of-generation* :unbound
"Can be any one of :gfrow, :ful_l, :ramped_half_and_half,,)
Thfud, the kemel defines three variables used by the randomizer and for
bookkeeping puryoses.
(defvar
,, mJ- ^
(defvar
tt mJ.'^
*seed* :unbound
seed for the park-Mirler congruential randomizer.,,\
*best-of -run-individual * : unbound
best individual found during this run.,,)
(defvar *greneration-of -best-of -run-individual* : unbound
"The greneration at which the best-of-run individual was found.,,)
Fourth, the kemel contains thu top level function run - g ene t i c - pr o grammi ng - sy s t em that controls the genetic programming system. This is the function that the user uses to invoke the kemel. It has four mandatory arguments.
The first mandatory argument to this function is prob I em- f unc r. i on. When
the kemel calls problem-function, this function delivers to the kemel the
functions that are needed by the kemel to define a specific problem.
The second mandatory argument to the run - gene t. r c -pro grammi ng - sy s t em
function is the seed to the randomizer.
The third mandatory argument is the maximum-qenerar. ions, G, tobe run.
The fourth mandatory argument is the size-of -popul ation, M.
After the four mandatory arguments, there may be any number (up to Id)
of optional s eeded-pro gram arguments.
This function calls the problem-function (using funcall) and thereby
obtains the problem specific functions that the user has defined in the problem specific part of the code.
This function does some cursory checking of the validity of arguments to
this function using four asserr clauses.
( defun run-qenetic-programming-system
,::::t.-- function
:l;:i:?-:il:il:i::=
&rest seeded-programs)
i; Check validity of some arguments
(assert (and (integerp maximum-qenerat.ions)
(not (minusp maximum-generations) ) )
( maximum- generaL i ons )
"Maximum-generations must be a non_negative
integer, not -S" maxj_mum-generations)
(assert (and (integerp size-of-population)
(plusp size-of -populat.ion) )
( size-of-population)
"Size-Of-population must be a positive integrer, _
not -9il size-of-population)
Appendix E
(assert (or (and (symbolp problem-function)
( fboundp problem-funct.ion) )
( functionp problem-function) )
(problem- f unction)
"Problem-Function must be a function.")
(assert (numberp seed) (seed)
"The randomizer seed must be a number" )
Set the global randomizer seed.
(setf *seed* (coerce seed 'double-float) )
rnitlalize best-of-run recording variables
(setf *generation-of-best-of-run-individual* 0 )
(setf *best-of-run-lndividual* nil)
Get the problem-specific funct.ions needed to
specify t.his problem as returned by a call to
problem- function
(mult ipIe-va1ue-bind ( adf0 - function-set -creator
adfl - funct. ion-set -creator
rpb- funct ion- set - creator
adf0 -terminal - set -creator
adf 1 - terminal - set - creator
rpb- terminal- - set - creator
fitnes s -cases -creator
fitness-function
parameter-defrner
terminat i on-predi cate )
( funcall problem- function)
Get the function sets and associated
argument maps
( mul t ipl e -value-bind ( adf 0 - f unct ion- s et adf 0 -argument -map )
( funcall adf 0-function-set-creator)
( mul t iple-value -bind ( adf 1 - f unct ion- set. adf 1 -argnrment -map )
( funcall adf 1-- function-set-creaLor )
(mut t ipl e -value -bind ( rpb- funct ion- set rpb-argnrment -map )
( funcal 1 rpb- funct ion- set -creator )
Set up the parameters using parameter-definer
( funcall parameter-def iner )
Print out parameters report
( describe-parameters - for- run
maximum-generat ions size-of -population)
Set up the terminal-set using terminal-set-creator
(let ( (adf 0-termj-nal-set
( funcall adf 0-terminal-set-creator) )
(adf1-terminal-set
( funcall adf 1-terminal-set-creator) )
( rpb-terminal-set
( funcall rpb-terminal-set-creat.or) ) )
i i Create the population
(let ( (population
( create-populat ion
si ze-of -populat ion
6n Appendix E: Computer Implementation of ADFs
adf 0 - function- set adf 0 -argument_map
adf0 -terminal -seL
adf 1 - function-set adf 1-_argumenL _map
adfl -terminal -set
rpb- funct ion- set rpb-argument -map
rpb-terminal - set
seeded-programs) ) )
Define the fitness cases using the
f it.ness-cases-creator function
(leL ( (fitness-cases
( funcall fitness-cases-creator) )
; i New-Programs is used in the breeding of
i i the new population. Creat.e it here to
i; red.uce consing'.
(new-progirams
(make-array size-of -population) ) )
Now run the Genetic proEramming paradiqm using
[he fitness-function and termination-predicate provided
( execut. e - generat ions
population new-programs ficness-cases
maximum-generat ions f itness- function
t erminat ion -predi cat e
adf 0 - funct ion- set adf 0 -argument -map
adf0-terminal-set
adfl- funct ion-set
adfl -terminal -set.
rpb-function-set
rpb-terminal-seL )
F"i n: I I rr nr.i nr- r lrrqary vl-rrrL OUt a fepOft
( report-on-run)
Return the population and fitness cases
(for debugginq)
(values population fitness-cases) ) ) ) ) ) ) ) f
Fifth, the kemel contains four functions for printing out various reports.
(defun report-on-run ( )
"Prints ouL the best-of-run individual
(1et ( (*print-pretty* t) )
(format t ,,-5%The best-of-run individual program _
for this run was found on _?generation _D and _
had a standardized fitness m€asur€ _
of -D and -D hit_p. "-?It was: _Z_5,
*g,enerat ion-of-best -of -run_ individual *
( individual - standardi zed- fitness
*best. -o f - run- individual * )
( individual -hit s *best. -of -run- individual * )
( individual-hits *best-of -run-individual* )
(individual-progrram *best-of -run-inciividuat *) ) ) )
adfl -argumenL -map
rpb-argument.-map
Appendix E
(defun report-on-generation (qeneration-number population)
,,prints out the best individual at the end of each gieneraLion"
(1et ( (best-individual (aref population 0) )
(size-of -population (lenqth popul-ation) )
(sum 0.0 )
(*print-Prett.Y* t) )
i i Add up all of the standardized fitnesses to get. average
(dotimes ( index size-of-population)
(incf sum ( individual-standardized-fitness
(aref population index) ) ) )
(formaL t ,,-2%Generation -D: Average standardized-fitness -
- -\ -aThe best individual progrram of the population
had a -%standardized fitness measure of -D
and -D hit-P. -%It was: -Z-S'
generation-number (/ sum (length population) )
( indivi-dual-standardi zed-f itness best-individual )
( individual-hits best-individual )
( individual-hits best-individual )
(individual-program best-individuaf ) ) ) )
(defun print-populat.ion (population)
"Given a population, this prints it out (for debugginq) "
(let ( (*print-prett.y* t) )
(dotimes (index (length population) )
(1et ( (individual (aref population index) ) )
(format t. "-&-D -S -S/
r_noex
( individual-standardi zed- fitness individual )
(individual-program individual) ) ) ) ) )
(defun describe-parameters- for-run
(maximum-generat. ions s i ze-of -populat ion )
"Lists the parameter settings for this run. "
(format t "-2%Paranteters used for this run.-
-2=============================,
(format t "-*Maximurn nr:rnber of Generations:-50T-D"
maximum-qenerations )
(format t "-%Size of Population:-50T-D" size-of-population)
(format t "-?Maximum depth of new individuals:-50T-D"
*max- depth- f or -new- indivi dual s * )
(format t "-?Maximum depth of new subtrees for mutants:-50T-D"
*max-depth- f or-new- subt rees - in-mutant s * )
(format t.
"-?Maximum depth of individuals after crossover:-5OT-D"
*max- depth- f or- individual s - a f t er- cros sover * )
(format t "-?Reproduction fraction: -50T-D"
* reproduct i on- f ract ion* )
(format L "-%Crossover at any point fraction:-50T-D"
* crossover-at -any-poinr - f ract ion* )
679 Appendix E: Computer Implementation of ADFs
580
(format t "-?crossover at function points fracti_on:-50T_D,, *crossover_at _ function_point_ fractlon* )
(format t ,,-?Number of ficness cases:_50T_D,,
*number_of _ f itness _cases * )
(format t "-?selection method: -50T-A, *method-of_serection*)
(format t "-%Tournament group size: -50T-A,, *tournament_size*)
(format t "-%Rand.omrzeT seed: -50T-D, *seed*))
(format t
"-?Fitness-proportionate reproduction fraction : _50T_D,,
* f i tnes s -proport i onaL e _ reproduct ion_ f ract i on * )
(format t ,,-?Crossover at. any point fraction:_50T_D,,
* crossover-at _any_point. _ f ract ion* )
(format t "-?crossover at function points fraction:-50T-D,,
*crossover-at- function_point _ fraction* )
(format L ,,-ZNumber of fitness cases:_50T_D,,
*number-of _ f i tnes s _cases * )
(format t. "-%serection method: -50T-A,, *method_of_selection*)
(format t ,,-?Generation method: _50T_A"
*method-o f - generat ion* )
(format t ,,-%Rand.om)_zer seed: *50T*D" *seed*))
Sixth, the kemel contains a group of six functions for creating the individual Programs for generation 0. These s.une functions are also used for
creating tree fragments if we happen tobe using the mutation operation. The
function creat e-populati-on causes the population of individuals to be created in a form specified by the variable "method-of -qenerarion*, which
can be : fu11, : grow or : ramped*half -and-harf, these being the methods
described tnGenetic Programming (section 6.2). Small changes t" tfri" function
allow different generative methods, such as a ramped, full, or grow method.
The choose-f rom-terminal-set function in this group creates the random
constants (if any) for the initial random programs. Small changes to this function permit different ranges and granularity for the random constants. We
use the do macro of Common LISP in create-poputarion and throughout
the kemel rather than the more convenient loop macro, because loop may
not be supported in some generally available, older implementations of
Common LISP.
( defvar *generat ion- 0 -uniqui f ier- table *
(make-hash-t.able : test #, equal )
'.useO ttfla^A F ^ ro guarantee that. a1l generation 0 individuals
are unique,, )
( defun create-program-branch
( function-set. argument-map terminal_set
minimum- depth - o f - trees maximum- depth - o f - t rees
individual-index fu11-cyc1"-p )
"creates a complet.e branch for an ADF-containing program.,,
( create- individual - subtree
funct ion-set argument -map
t.erminal-set
Appendix E
( ecase *method-of -qeneration*
( (: full :grow) maximum-d'epth-of -trees)
( : ramped-hal f -and-hal f
( + minimum-dePth-of -trees
(mod individual-index
( - maximum-dePth-of -trees
minimum-depth-of -trees ) ) ) ) )
t
( ecase *method-of -generation*
(: full t )
(:grow nil)
( :ramped-half-and-half full-cycfe-p) ) ) )
(defun create*new-program ( individual-index ful1-cyc1e-p
minimum-dePth-o f -trees
maximum- dePth- o f - Lrees
adf 0 - function-set adf 0 -argument -map
adf 0-termina]-set
adf l- * funct ion-set adf L -argument -map
adfl-terminal-set
rpb- funct ion- set rpb-argnrment -map
rpb-terminal-set )
"Creates a new individual with ADF sLructure."
(make-adf-program
: adfO
( cr eat e -program-branch
adf 0 - function-set adf 0 -argument -map
adf 0 -terminal - set minimum-depth-of -trees
maxj-mum-depth-of -trees individual-index ful1-cycI"-p)
: adf lI rraal- a-n7 \ v! vsev r--ogram-orancn
adf 1 - function- set adf 1 -argument-map
adf l- - terminal - set minimum-depth-o f - trees
maximum-depth-of -trees lndividual-index fulI-cycle-p)
: rpb
( c reat e -program-branch
rpb- funct ion- set rpb-argument -map
rpb-t erminal - set minimum-depth-of -Lrees
maximum-depth-of-trees individual-index full-cycle-p) ) )
(defun create-population ( size-of -population
adf 0 - funct.ion-set adf 0-argument-map
adf 0 -terminal - set
adf l--function-set adf 1-argument -map
adf 1--terminal-set
rpb- funct ion-set rpb-argument -map
rpb-terminal-set.
seeded-programs )
"Creates t.he population. This is an array of size
size-of-population that is initialized Lo contain individuaf
records. The Prosram slot of each individual is initialized
681 Appendix E: Computer Implementation of ADFs
to a suitable random program except for the first N programs,
where N = (lengt.h seeded_progrrams) . For these first N
indi-viduals the individuar is j_nitialized with the respect.lve
seeded program. This is very useful in debugging.,,
(let ( (poputation (make_array si ze_of _population) )
(minimum_depth_of _trees 1 )
(attempts-at_this_individual 0 )
(ful1-cycte_p nit) )
(do ( (individual-index 0) )
( (>= individual-index size-of-popurat.ion) )
(when (zeroo
t*oO individual-index
(max l- (- *max-depth-for-new-individuals*
minimum-depth-of-trees) ) ) )
(setf full-cyc1e-p (nor full_cycle_p) ) )
(let ( (new-Proqram
(if (< lndividual-index (rengt.h seeded-programs) )
;; Pick a seeded individual
(nth individual-index seeded_programs)
;'; Create a new random prog.ram.
( create-new-program
individual-index ful1 -cyc1e_p
minimum- depth- o f - t ree s
i; We count one level of depth for the
; i root above aIl of the branches that
i; get evolved.
(- *max-depth*for-new-individuals* 1)
adf 0 - funct ion-set adf 0 -arg.ument -map
adf0 -terminal - set.
adf 1 - funct ion-set. adf 1 -argument -map
adfl -terminal - set
rpb- funct. ion- set rpb-argument -map
rpb-terminal-set) ) ) )
;; Check if we have already created this prog:ram.
; i If not then sLore it and move on.
ii If we have t.hen try again.
(leL ( (program-as-list
( list (adf -program-adf0 new-program)
( adf -program-adf 1- new-program)
(adf-program-rpb new-program) ) ) )
i; Turn the defstruct representation of the
; i program into a list so that it can be
i i compared using an EeUAL hash table.
;; defstruct instances have to be compared with EeuALp
(cond ( (< individuar-index (rength seeded-programs) )
(setf (aref populat.ion individual-index)
(make-individual :program new-program) )
( incf individual-index) )
( (not (grethash program-as-list
*generat ion- 0 -uniqui f ier-table* ) )
682 Appendix E
(setf (aref populat.ion individual-index)
(make-individual :program new-program) )
(setf (gethash program-as-Iist
* generat i on- 0 -uniqui f ier- tabl e * )
! \
L /
(setf attempts-at-this-individual 0 )
( incf indi-vidual-index) )
( (> attempts-at-this-individual 20 )
i i Then this depth has probably filled up, so
)rrrmn Fl-r^ dpnfh -^rrni-ar t t UUtt}J LIIV seL'91r Uvqlus! .
( incf rninimum-depth-of -trees )
i i Bump the max depth too to keep in line with
;; new minimum.
( setf *max-depth-for-new-individuals*
(max *max-depth- f or-new- individuals *
minimum-depth-of -trees ) ) )
(:otherwise
(incf attempts-at-this-individual) ) ) ) ) )
i i Flush out. uniquifier table to that no pointers
; ; are kept to generation 0 individual_s.
(clrhash *generaLion-0-uniquif ier-table* )
;; Return the population that we've just creat.ed.
1 -ts.i ^ . popurarron) )
(defun choose-from-terminal-set (terminal-set)
"Chooses a random terminal from the terminal set.
If the t.erminal chosen is the ephemeral
: F loat ing- Point -Random-Constant,
then a floating-point single precision random constant
is created in the range -5.0->5.0.
rf : rnteger-Random-constant is chosen then an integer random
constant is grenerated in the range -10 Lo +10. ,,
(let ( (choice (nt.h (random-integer (length terminal_-set) )
terminal-seL) ) )
(case choice
( : f 1oat ing-point -random-constant.
i i pick a random number in the ranqe -5.0 -> +5.0.
i ; Coerce it to be single precision float.ing-point.
i i Double precision is more expensive
;; A similar clause to this could be used to coerce it
; i to double prevision if you really need
; i double precision.
i; This is also the place to modify if you need a ranqe
; i other than -5.0 -> +5.0.
(coerce (- (random-floating-point-number 10.0) 5.0)
'single-float) )
( : integer-random-constanL
i; pick a random int.eger in the range -10 -> +10.
(- (random-integer 2L) j_0))
(ot.herwise choice) ) ) )
683 Appendix E: Computer Implementation of ADFs
( defun create- individual-subtree
( funct.ion-set argument-map terminal-set
allowab1e-depth top-node-p ful1-p)
"Creates a subtree recursively using the specified functions
and terminals. Argument map is used to determine how many
arguments each function in t.he funct.ion set is supposed to
have if it is selected. Allowable depth is the remainingr
depth of the tree we can create, when we hit zero we will
only select terminals. Top-node-p is true only when we
are being called as the top node in the tree. This allows
us to make sure that we always put. a function at the top
of the tree. Full-p indicates whether this individual
is to be maximally bushy or not."
(cond ( (<= allowable-dePth 0)
;; We've reached maxdepth,
( choose- f rom- t.erminal - set
( (or fu11-p top-node-p)
Il7a aro l-ho l-nn nnrlo nr | | uvy rrvuv v !
so just pack a terminal
terminal*set) )
are a full tree,
qn n'i r.1r nn'l rz a fr rnnf i n n
t l
(let ( (choice (random-integer (lenqLh funcLion-set) ) ) )
(1et ( (function (nth choice function-set) )
(number-of-argmments
(nth choice argument-map) ) )
(cons function
( creat e -argument. s - f or- funct ion
number-of -argument s funct ion- set
argument -map terminal -set
(- allowable-depth 1) full-p) ) ) ) )
( : otherwise
i ; choose one from the bagr of functions and t.erminals.
(Iet ( (choice (random-integer
(+ (length terminal-set)
(length function*set) ) ) ) )
(if (< choice (length function-set) )
;; We chose a function, so pick it out and go
i; on creating the tree down from here.
(let ( (function (nth choice function-set) )
( number - o f - argument. s
(nth choice argument-map) ) )
(cons function
( creat.e -argument s - for - f unc t ion
number-of -argnrment s funct ion- set
argument -map terminal-set
(- allowable-depth 1) fult-p) ) )
ii We chose an atom, so pick it out.
(choose-from-terminal-set terminal-set.) ) ) ) ) )
684 Appendix E
( defun create-arguments- f or- function
(number-of -arguments function-set
argument.-map terminal-set allowable-depth
full-p)
"Creates the argument list for a node in the tree.
Number-Of-Arguments is the number of arguments still
remaining to be created. Each argument is created
in the normal way using Create-individual-subtTee."
(if (= number-of-arguments 0)
nil
( cons ( creaLe-individual -subtree
function-set arqument-map terminal-set
allowable-depth nil fulI-p)
( create-arguments- for- function
(- number-of-arguments 1) function-set
argument-map terminal-set
al-1owable-depth ful1-p) ) ) )
Seventh, the kemel contains a group of five functions to execute the main
parts of the genetic programming system.
(defun execute-generations
(population new-programs fit.ness-cases maximum-generations
f itness - funct ion terminat ion-predicate
adf 0 - funct ion- set adf 0 -arcrument -map
adf 0 -terminal - set
adf 1- - funct ion- set adf 1 -argnrment -map
adfl - terminal - set
rnh-frrnr.l- qrrvu!vrr i nn-qol- rrrh-:rrnrmanl- -m:n rvu !yv q!vqtrurru rltqp
rpb-terminal-set )
"Loops until the user's termination predicate says to stop.,'
(do ((current-generat.ion 0 (+ l- current-generation) ))
;; loop incrementing current generation until
; ; termi-nati-on-predicate succeeds.
( (Iet ( (best-of-generation (aref population 0) ) )
( funcall
terminat ion-predicate current -generat ion
maximum-generations
( individual - standardi zed- f i tnes s best -o f - generat ion )
(individual-hits best-of-generation) ) ) )
(when (> current-generation 0)
i; Breed the new population to use on this generatron
;; (except gen 0, of course) .
(breed-new-population population new-prog.rams
l3l3 _l]iffi:l_ :::
adf 0 -arsnrmenr -map
adf l_-function-set adf 1-argument.-map
adf l--terminal-set
rpb- funct ion- set rpb-argument -map
rpb-t.erminal-set ) )
585 Appendix E: Computer Implementation of ADFs
t i Clean out the fitness measures.
( zero t-ze- f i tnes s -measures -o f -populat ion populat ion )
; ; Measure the fitness of each ind.ividual. Fitness values
i; are stored in the individuals themselves.
( evaluate- f itness -of -populat ion
population fitness-cases fitness-function)
; i Normal-ize fiLness in preparation for crossover, etc.
( normal i ze - f i tnes s -o f -populat i on popu 1 at i on )
;; sorL the population so that t.he roulette wheel is easy.
( sort -populat. i_on-by- f itness populat ion )
;; Keep track of best-of-run individual
(1et ( (best-of-generation (aref population 0) ) )
(when (or (not *best-of-run-individual*)
( > ( individual -standardized- fitness
*best -of -run- individual * )
( individual - standardi zed- f itnes s
best-of-qeneration) ) )
( seLf *best-of -run-individual*
(copy-individual best-of -generaLion) )
( set. f * generat ion-o f -best - o f - run- individual *
current -generat ion) ) )
i i Prrnt out the results for this generation.
(report-on-generation current-generation population) ) )
(defun zerolze-f itness-measures-of -population (population)
"Clean out the statistics in each individuaf in t.he
population. This is not sLrictly necessary, but it herps to
avoid confusion that might be caused if, for some reason, we
land in the debugger and there are fitness varues associated
with the individual- records that actually matched the program
that used to occupy this individual record. ,,
(dotimes (individual-index (length population) )
(let ( (individual (aref popularion individuat-index) ) )
(setf (individual-standardized-fitness individual) 0.0)
(setf (individual-adjusted-firness individual) 0. 0)
(setf (individual-normalized-fitness individuar) 0. 0)
(setf (individuat-hirs individual) 0) ) ) )
( defun evaluate- f itness-of -population (population f itness-cases
fitness- function)
"Loops over the individuals in the population evaluating and
recording the f itness and hits. ,'
(dotimes (individual-index (length population) )
(]et ( (individual (aref population individual-index) ) )
(multiple-value-bind ( standardized-f itness hits )
( funcall fitness-function
( individual-proqram individual )
fitness-cases )
i ; Record fitness and hits for this individual.
( setf ( individual-standardized-fitness individual )
st.andardi zed - f itnes s )
686 Appendix E
(setf (individual-hits individual) hits) ) ) ) )
(defun normalize- f itness-of -population (population)
"Computes the normalized and adjusted fitness of each
individual in the population. "
(let ( (sum-of-adjusted-fitnesses 0.0) )
(dotimes (individual-index (length population) )
(let ( (individual (aref population individual-index) ) )
i i Set the adjust.ed f itness.
(setf ( individual-adjusted-fit.ness individual)
(/ l-.0 (+ 1.0 (individual-standardized-fitness
individual) ) ) )
i i Add up the adjusted fitnesses so that we can
; i normalize them.
( incf sum-of -adiusted-f itnesses
(individuuf -uAjrr"ted-fitness individuaf ) ) ) )
i; Loop through populat.ion normalizing the adjusted fitness,
(dotimes (individual--index (length population) )
(let ( (individual (aref population individual-index) ) )
( set f ( individual -normal i zed- f itness individual )
( / (individual-adjusted-fitness individual)
sum-of-adjusted-fitnesses) ) ) ) ) )
(defun sort-population-by-f itness (population)
,,Sclr,t- s, f errv he yv}/u nnnu'l :f ion ar-r-nrdi no f n nrlyrnAlizeo f itness.
The population array is destructively modified. "
(sort. population #'> :key #' individual-normalized-fiLness) )
Eighth, the kemel contains six functions for conholling the breeding of the
new population. This involves executing the appropriate genetic operation
(e.g., crossove1 reproduction, or mutation) with the appropriate probability.
The *met.hod-of -selection* may be either : f itness-proportionate or
: tournament.
( defun breed-new-populat ion
(population new-programs
adf 0 - f unct i on- s et adf 0 - argn-rment -map adf 0 - t erminal - set
adf 1- function-set adf 1-argument-map adf 1-terminal-set
rpb- function-set rpb-argument-map rpb-terminal-set )
"Controls the actual breeding of the new population.
Loops through the population executing each operat.ion
(e.9., crossover, fitness-proportionate reproduction,
mutation) until it has reached the specified fraction.
The new programs that are created are stashed in new-progralns
until we have exhausted the population, then we copy the new
individuals into the old ones, thus avoiding consing a new
bunch of individuals."
(let ( (population-size (lengt.h population) ) )
(do ((index 0 )
(fraction 0 (/ j-ndex populat.ion-size) ))
( (>= index populat.ion-size) )
687 Appendix E: Computer Implementation of ADFs
(let ( (individual-1
(find-individual population) ) )
(cond ( (and (< index (- population-size 1) )
(< fraction
( + *crossover-at-funct.ion-point-fracLion*
xcrossover-at-any-point-fract ion* ) ) )
(multiple-value-bind (new-male new-female)
( funcall
(if (< fraction
*cros sover-at - funct ion-point - f ract ion* )
' cros sover-at - funct ion-point s
' cros sover-at -any-point s )
individual-1
( find-individual population) )
(setf (aref new-prograrns index) new-male;
(setf (aref new-programs (+ 1 index) )
new-female) )
(incf index 2) )
( (< fraction
( + *reproduction- fraction*
*cros sover-at - funct ion-point. - f ract ion*
*crossover-at -any-point. - f ract ion* ) )
(setf (aref new-programs index) individual-j-)
(incf index 1) )
( : ot.herwise
(setf (aref new-prograrns index)
(mutate individual-1
adf 0 - funct ion- set adf 0 -argnrment -map
adf0-terminal-set
adf 1 - function-set adf 1 -argnrment -map
adfl-Lerminal-set
rpb- funct ion- set rpb-argmment -map
rpb-terminal-set) )
(incf index 1)))))
(dotimes ( index population-size)
(setf (individuar-program (aref popuration index) )
(aref new-programs index) ) ) ) )
(defun find-individual (population)
"Finds an individuar in the popuration accordinq to the
def ined selection met.hod. ,,
( ecase *method*of -selection*
( : tournament ( f ind-individuar-using-tournament-selection
population) )
( : f i tness -proport ionate-wi th-over- select ion
( f ind- f itnes s *proport ionate- individual
( random- f I oat ing -po int -number -wi th - over - s e 1 ec t i on
population)
population) )
Appendix E
I . f i f noqq-r'rrnrrnrl- i nnef o
\.!!grrvvp}/!vl/v!
( f ind- f itnes s -proport. ionate- individual
(random-floating-point-nurnlcer 1. 0) population) ) ) )
( de f un random- f 1 oat ingr-po int -nunrJcer -wi th - over - sel ec t i on
(population)
"Picks a random number between 0.0 and 1.0 biased usinq the
over-selection method. "
(]et ( (pop-size (length population) ) )
(when (< pop-size 1000)
(error "A population size of -D is too small -
for over-selection. " pop-size) )
(1et ( (boundary (/ 320.0 pop-size) ) )
i; The boundary between the over and under select.ed parts.
(if (< (random-floating-point-number l-.0) 0.8)
; i 80? are in the over-selected part.
( random- f loating-point-number boundary )
(+ boundary
( random- f 1oat. ing-point -nr-unlcer
(- 1.0 boundary)))))))
(defun pick-k-random*individual-indices (k max)
"Returns a list of Krandomnumbers between 0 and (- max l-)."
(let ( (numbers nil) )
(loop for numbor = (random*integer max)
unless (member number nunilcers :test #'eql)
do (push number numbers)
until (- (length numbers) k) )
numbers) )
( de fun f ind- individual -us ing-tournament - select ion (populat ion )
"Picks *tournament-size* individuals from the population at.
random and returns the best one."
(1et ( (numbers (pick-k-random-individual-indices
*tournament-size* (lengLh population) ) ) )
best = (aref population (first nunrlcers))
best-fitness
= (individual-st.andardized-fitness best)
number in (rest numbers)
individual = (aref population number)
this- fitness
= (individual-standardized-fitness individual )
when (< this-fitness best*fitness)
do (setf best individual)
(setf best.-fitness this-fitness)
finally (return (individual-program best) ) ) ) )
(defun f ind- f itness-proportionate-individual-
(after-this- f itness population )
(loop with
with
for
for
for
689 Appendix E: Computer Implementation of ADFs
"Finds an individual in the specified population whose
normalized fitness is greater than the specified value.
Al-1 we need to do is counL along the populat.ion from the
beginning adding up the fitness until we get past the
specified point. "
(let ( (sum-of-fitness 0.0)
(population-size (length population) ) )
( 1et ( ( index-of -selected-individual
(do ((index 0 (+ index 1)))
; t sxit condition
( (or (>= index population-size)
(>= sum-of-fitness after-this-fitness) )
(if (>= index population-size)
(- (length population) 1)
(- index 1) ) )
;i Body. Sum up the fitness values.
(incf sum-of-fitness
( individual -normal-ized- fitness
(aref population index) ) ) ) ) )
( individual-program
(aref population index-of-select.ed-individual) ) ) ) )
Ninth, the kernel contains a group of 10 functions for performing strucfure-preserving crossover at arry point.
(defun select-branch (within-program)
"Returns two values:
- A keyword in {:ADF0, :ADFI-, :RPB} to denote a
branch selected at. random. The selection of the
branch is biased according to the number of
points in that branch.
- The subtree for the branch selected.,,
(Iet ( (adf0 (adf-program-adf0 within-program) )
(adf1 (adf-program-adf1 within-program) )
(rpb (adf -program-rpb within-program) ) )
(Iet ( (adfO-points (count-crossover-points adf0) )
(adf1-points (count-crossover-points adfl) )
(rpb-points (count-crossover-points rpb) ) )
(let ( (selected-point
(random-lnteger
(+ adf 0-poj-nts adf 1-points rpb-points) )) )
(cond ( (< selected-point adfO-poincst
(values :adf 0 adf 0) )
( (< selected-point (+ adfl-points adf0-points) )
(values : adf l- adf 1) )
(t (values :rpb rpb) ))))))
(defun adf-program-branch (branch program)
"Returns a branch from Program selected by the keyword
Branch. "
690 Appendix E
(ecase branch
( :adfO (adf-program-adf0 program) )
(:adfl- (adf -program-adf l- progfram) )
(:rpb (adf-program-rpb program) ) ) )
(defrrn
\uu!urf conv- vvI/J i nrlividual-substituting-branch
(branch new-branch- subtree program-to-copy )
"Makes a copy of Program-To-Copy only substituting
the branch selected by Branch with t.he new branch
subtree created bY crossover."
(make-adf-program
:adf0 (if (eq :adf 0 branch)
new-branch-subtree
(copy-tree (adf-program-adf0 program-to-copy) ) )
:adfl (if (eq :adfl branch)
new-branch-subtree
(copy-Lree (adf-program-adfl- prograrn-to-copy) ) )
:rpb (if (eq :rpb branch)
new-branch-subtree
/r-r'rnrz-trpe (adf-nrncrram-rntt T)rr-lr.rram-ro-r-onw) ) ) ) ) \uuI/J ulss \qul }/lvvr(4rl ryv y!vY!4LL ev vvyJ /
(dpfrrn .rosso\/Fr-sel er-f i no-branch \ uu ! ull e larY v !
(how-t.o-crossover-function male female)
"Performs CroSSoVer on the proqrams Male and Female by calling
the function How-To-Crossover-Function, which wi|l cause it
to perform crossover at either funct.ion points or at. any
point.
The crossover happens between a compatible pair of branches
in the two parents.
Once the crossover has happened the functlon returns two new
individuals to insert into the next generaLion. "
(let ( (branch (select-branch male) ) )
(multiple-value-bind (new-male-branch new- female-branch)
( funcall how-to-crossover- funct ion
r/ :rlf -nrnar^n-bfanCh bfanCh male) \ uv! y! v:J! lJL
(adf-program-branch branch female) )
(values ( copy-individual -substituting-branch
branch new-male-branch male)
innmz-i ndi rri drr,:1 ^"1'-^F'l +' '- l -^-bfanCh
\ Lt lry - rrlL.lI v luuar->uuD Lr L uL ftrv
branch new-female-branch female) ) ) ) )
(defun crossover-at-any-points (male female)
"Performs crossover on the programs at any pornt
in the trees. "
( cros sover- select ing-branch
#' crossover-at-any-points-within-branch mafe female) )
(defun crossover-at-any-points-within-branch (male female)
"Performs crossover on the program branches at any point
in the subtrees. "
i; Pick points in the respective trees
i; on which to perform the crossover.
691 Appendix E: Computer Implementation of ADFs
(let ((male-point
(random-integer (count_crossover_pornts male) ) )
( female-point.
(random-integer (count_crossover_points female) ) ) )
;; First, copy the trees because we destructively modify Lhe
;; new individuals to do t.he crossover. Reserection is
;; allowed in the original population. Not copying would
i; cause the individuars in the o1d population to
;; be modified.
(]et ( (new-mate (1ist (copy_Lree male) ) )
(new-female (list (copy_tree female) ) ) )
;; Get the pointers to the subtrees indexed by male-point
i; and female-point
(multiple-value-bind (male-subtree-pointer male- fragnnent )
(get-subtree (first new-mare) new*mare male-poinc,1
(multiple-va1ue-bind
( female-subtree-pointer female- fracrment )
(get-subtree
(first new-female) new-female female-point)
;; Modify Lhe new individuals by smashing in the
,; (copied) subtree from the old individual.
(setf (first male-subtree-pointer) female_fragment.)
(setf (first female-subtree-pointer) male_fraqEnent) ) )
;; Make sure that the new individuals aren,t too big.
(validate-crossover male new-male female new-female) ) ) )
(defun count-crossover-points (program)
"Counts the number of points in the tree (program).
This includes functions as well as terminals.,'
(if /r.nncn nrndr.m\
"..:ogram) (+ 1 (reduce #,+ (mapcar #,count-crossover-points
(rest program))))
1))
(defun max-depth-of-tree (tree)
"Returns the depth of the deepest branch of the
tree (program) . "
(if (consp tree)
(+ 1 (if (rest tree)
(apply #'max
(mapcar #,max-depth-of-tree (rest tree) ) )
0))
1))
(defun greL-subtree (tree pointer-to-tree index)
"Given a tree or subtree, a pointer to that tree/subtree and
an index return the component subtree that is numbered bv
Index. We number left to right, depth first.,'
(if (= index 0)
(values pointer-to-tree (coov-tree free) index)
(if (consp tree)
692 Appendix E
(do* ( (ta1} (rest t.ree) (rest tail) )
(argument (first tail) (first tail) ) )
( (not tail) (val-ues nil nil index) )
(multiple-value-bind
(noinr-nn'i nf e1. ney-t1.ee
lrrof -qrrhj- rc6 A-Y (\ rmonl- \Yvu uu ulvl4llvrlu
(if (= new-index 0)
( return
(values new-poinLer new-tree new-index) )
(setf index new-index) ) ) )
(values nil nil index) ) ))
(defun validate-crossover (male new-male female new-female)
"Given the old and new mal-es and females from a crossover
oneration check to see whether we have exceeded the maximum
allowed depth. If either of the new individuals has exceeded
the maxdepth then the old individual is used. "
(1et ( (mate-depth (max-depth-of-tree (first new-male) ) )
(female-depLh (max-depth-of-tree (first new-female) ) ) )
(values
(if (or (= 1 male-depth)
(>= male-depth i; >= courts 1 depth for root above
i i branches.
*max-dent h - f or:- individual s -af ter-cros sover* ) )
new-male) )
1 fema'l e-dcnth) svy urr /
famal o-don1_h
vvyear
*max-dent h - f or- individual s -af ter-cros sover* ) )
new-female)))))
nor^7- i ndarr\
!I fgvz: /
tail (- index 1) )
male
( first
(if (or ( -
(>=
female
( first
Tenth, the kemel contains a group of four functions for performing crossover restricted to function (internal) points.
(defun crossover-at-function-points (male female)
"Performs crossover on the two programs at a function
(internal) point in a randomly selected branch of Lhe trees-"
( r-roq qrrrrar- qa'l ect'i no-hran ch \vlvppvvv! Y v-sr-v^.
#' crossover-at- function-points-within-branch male female ) )
( de fun cros sover-at - f unct ion-point s -within-branch (ma1e f emale )
"Performs crossover on the Lwo program branches at a funcLion
(inf
\ !f revrrfsf orna'l )
/ ooint in the trees." I
i; Pick the function (internal) points in the respective trees
;; on which to perform the crossover.
(let ( (male-point
(random-integer (count-function-points male) ) )
( female-point
(random-integer (count-function-point.s female) ) ) )
i; Copy the trees because we destructively modify the new
Appendix E: Computer Implementation of ADFs
;; individuals to do the crossover and Reselection is
i; allowed in the original population. Not copying would
i i cause the individuars in t.he o1d population t.o
;; be modified.
(1et ( (new-ma1e (1ist (copy_tree male) ) )
(new-female (list (copy_t.ree femate) )))
i; Get the pointers to the subtrees indexed by mare-pornt
;; and female-point
(multiple-value-bind (male-subtree-pornter male_fracrment )
( get - f unct i on- subtree
(first new-male) new_male male_point)
(multiple-value-binc.
( female-subtree-pointer female_ fraqment )
( get- funct ion-subtree
(first new-female) new-female female*point)
;; Modify the new individuals by smashing in
i; the (copied) subtree from the old individual.
(setf (first mare-subtree-pointer) female-fragrnent)
(setf (firsL female-subtree-pointer) male-fragment) ) )
i; Make sure that. the new individuars aren,t Loo big.
(valldate-crossover male new-male femal-e new-female) ) ) )
(defun counL-function-points (program)
"Counts the number of function (internal) pornrs
in the program.,'
(if (consP program)
(+ 1 (reduce #,+ (mapcar #,count_funct.ion_points
(rest progrram) )))
0) )
(defun get-function-subtree (tree pointer_to_tree index)
"Given a tree or subtree, a pointer to that tree/subtree and
an index reLurn the component subtree that is labeled with
an rnLernal point that is numbered by rndex. we number left
to right, depth f irst .
,,
(if (= index 0)
(values pointer-to-tree (copy_tree tree) index)
(if (consP tree)
(do* ( (tail (rest tree) (rest tail) )
(argument (firsr tail) (first tail) ) )
((not tail) (values nil nil index) )
(multiple-value_bind
(new-pointer new_tree new_index)
(if (consp argument)
( get - f unct i on- subt.ree
arqument tail (- index 1) )
(values nil nil index) )
(if
!=
ttut-index 0)
(return
(values new-pointer new-tree new-index) )
694 Appendix E
(setf index new-index) ) ) )
(values nil ni1 index) ) ) )
Eleventh, the kernel contains a function for performing the mutation
operation.
(defun muLate
(program
adf 0-function-set adf 0-argument-map adf 0-cerminal-set
adfl--function-set adf 1-argument-map adf 1-terminal-set
rpb-function-set rpb-argument-map rpb-terminal-set )
"Mutates the argument proqraln by picking a random point in
the tree and substitutingt in a brand new subtree created in
the same way that we create the initial random populatiorr."
; ; Pick the mutation point.
(multipIe-value-bind (branch branch-tree)
( select -branch progiram)
(1et ( (mutation-point
( random- integer
(counL-crossover-points branch-tree) ) )
; ; Creat.e a brand new subtree.
(new-subtree
( create-individual - subtree
(case branch
( : adfO adfO-function-set )
( : adfl adfl-function-set)
( : rpb rpb-function-set) )
(case branch
( : adf 0 adf 0-arqument-map)
( : adf 1 adf 1-argument-map)
/ . rnlr y6J-1-274..mnn1- -m=n \ \
\ . rvu rpu- c!9LrlltcrlL-llrqI, ) I
(case branch
( : adfO adf0-terminal-set)
( : adfl adfl-terminal-set)
( :rpb rpb-terminal-set) )
*max-depth-for-new-subtrees-in-mutants* t ni1) ) )
(let ( (new-branch (list (copy-tree branch-tree) ) ) )
(multiple-value-bind (subtree-pointer fragment )
; ; Get the pointer to the mutation point.
(get-subtree ( first. new-branch)
new-branch mutat ion-point )
;; Not interested in what we're snippingt out.
(declare (ignore fragment) )
Smash in the new subtree. I t
(setf (first subtree-pointer) new-subtree) )
(values ( copy-individual-substituLing-branch
branch (first new-branch) program)
new-subtree)))))
TWelfth, the kernel contains a group of three functions for generating
random numbers needed by the genetic programming system. The first
695 Appendix E: Computer Implementation of ADFs
is the Park-Miller multiplicative congruential randomizer (Park and
Miller 1988).
(defun park*miller-randomizer ( )
"The Park-uiller multipricative congruential randomizer
(Communications of the ACM, october BB, page j-195).
creates pseudo random floating point numbers in the range
0.0 { X (= 1-.0. The seed value for this randomizer is
called *seed*, so you should record/set this if you wanl
to make your runs reproducible. ,,
(assert (not (zerop *seed*)) O ,,*seed* cannot be zero.,,)
(let ( (multiplier 16807.0d0);16807 is (expt I 5 )
(modulus 2147 483647. 0d0) )
;2147483647 is (- (expt 2 3t) 1 )
(let ( (temp (* multiplier *seed*) )
(setf *seed* (mod temp modulus) )
; ; Produces floating-point number
;; 0.0 ( X (= 1.0
(/ *seed* modulus) ) ) )
The Park-Miller randomizer can then be used to create random floatingpoint numbers as follows:
(defun random-floating-point-number (n)
"Returns a pseudo random floabing-point number
in ranqe 0.0 <= numJf,er < n,,
( (random-number (park-miller-randomi zer) ) )
we subtract the randomly gienerated number from 1.0
before scaling so that we end up in t.he range
0.0 <= x < 1.0, not 0.0 ( X r= 1.0
n (- 1.0d0 random-number) ) ) )
The Park-Miller randomizer can thenbe used to create random integers as
follows:
(defun random-integer (n)
"Returns a pseudo-random integer in the range 0 -> n-1_.,,
(tet. ( (random-number (random-float.ing-point-number 1.0) ) )
(floor (* n random-number))))
The user can test the correcbress of his Park-Miller randomizer by starting
with a seed of 1.0 and running it 10,000 times. At that point, the seed should be
1.043618065 x L0e. Webelieve thatthe code for the Park-Miller randomizer above
is very nearly madrine independent and LISP implementation independent.
The progrclrns, Procedures, and applications presented in this book have
been included for their instructional value. The publisher and author offer
NO WARRANTY OF FITNESS OR MERCHANIABILITY FOR ANY PARTICULAR PURPOSE or accept any liability with respect to these programs,
procedures, and applications. U.S. patent numbers A,ggs,g77, s,]g6,6g6,
5,r48,5r3, Canadian patent number r,gL1^,s61,, Australian patent number
611,350, and U. S. and foreign patents pending.
)
i n 1_ha r:nno
l1 a + \ rcu
; ;
( *
696 Appendix E
Appendix F: Annotated Bibliography of
Genetic Programmirg
One hundred papers have been published on the subject of genetic programming in the L5 months since the publication of Genetic Programming in
Decembetl992. This group of papers does not include 49 papers of which I
am author or co-author.
M*y of these L00 papers were published in the proceedings of various
conferences, including the hrternational Conference on Simulation of Adaptive Behavioq, the IEEE hrtemational Conference on Neural Networks, the
Lrtemational Workshop on Artificial Life, the lrternational Conference on
Genetic Algorithms, the Intemational Simulation Technology Multiconference,
and the National Conference onArtificial Intelligence.
The largest existing concentration of papers on genetic programming is the
recently published book Adaances in Genetic Programming editedby Kenneth
E. Kinneaa jr. (Kinne ar 1994a).hr addition, the proceedings of the IEEE World
Conference on Computational Intelligence in Florida on ]une 26 to luly 2,
L994, contain another large group of papers on genetic programming.
Many papers (including not-yet-published papers) are announced over the
genetic programming electronic mailing list or are deposited in the on-line
public repository for genetic programming as described in appendix G.
This appendix briefly reviews these 100 publications (each of which is
flagged with a = in the bibliogaphy). The publications in this appendix are
rather arbitrarily divided into the following groups:
. Design,
. Pattem recognition and classification,
'Robotic control and planning,
. Neural networks,
. Induction and regression,
'Financial,
. Art,
. Databases,
. Algorithms,
. Natural language,
'Modules,
. Programming methods,
. Variations in genetic operations,
' Memory, state, and mental models, and
. Theoretical foundations.
81. DESIGN
81.1 Design of Stack Filters and Fitting Chaotic Data
Howard Oakley (7994) of the Institute of Naval Medicine in the United Kingdom considers two scientific applications of genetic programming.
hr the first, Oakley compares a heuristic search method, the conventional
genetic algorithm, and genetic programming for developing a filter to remove noise from experimental data. The stack filter evolved by genetic proappeared as the fittestanswer and is in currentuse in a laser Doppler
rheometer system.
Oakley also used genetic progranuning to evolve equations to fit chaotic
time series data produced by the Mackey-Glass equations and certain physiological data.
F.2 PATTERN RECOGNITION AND CLASSIFICATION
F.2.1, Feafure Discovery and Image Discrimination
Thckett (1993a,I993b) of Hughes Missile Systems applied genetic programming to a difficult induction problem using data taken from the real world. A
comparative performance study was conducted against other well-known
methods of machine leaming. Fibress cases comprised features computed
from a U.S. Army database of 512-by-640 pixel infrared images containing
tracked and wheeled vehicles, fixed- and rotary-wing aircraft, and air defense units in a cluttered terrain. The feafures could be computed from subreglons containing either targets (e.g., tanks, aircraft) or clutter (e.g., rocks and
bushes). The fitness of an individual was based on its ability to discriminate
between these two categories. Fitness was computed using an in-sample set
of 2,000 fihress cases. The fitness of the best-of-generation program was reported using a larger out-of-sample set of 7,000fihress cases in order to determine the ability of the evolved program to generalize with respect to data it
has not encountered in training.
In a first experiment, genetic prografiuning was used to construct classifiers that processed feature vectors produced by a preedsting algorithm. hr a
second experiment, genetic prograruning was allowed to form its own feature set directly from the primitive intensitymeasurements. Against the same
data sets, the results produced by genetic programming achieved better
performance than the results produced by an ID3-like decision tree classifier
and a multilayer perceptron trained using back propagation.
Appendix F
F.2.2 Pattern Recognition using Automatically Defined Features
Andre's (199a{ approach to a two-dimensional pattem recognition problem
involved evolving hit-or-miss feature-detecting matrices (using a two-dimensional version of the conventional genetic algorithm) while simultaneously
evolving a computer program (using genetic progranuning) to act on the hitor-miss results reportedby the feature detectors. The feature-detecting matrices were evolved using a crossover operator that exchanges randomly chosen
sub-matrices.
F.2.3 Upgrading Rules for an OCR System
One approach to optical character recognition involves writing detailed rules
for recognizing each possible character and each possible font. Andre (I994c)
successfully used genetic prograrnming to upgrade handwritten rules when
new characters and new fonts must be processed.
F.2.4 Prediction of Secondary Structure of Proteins
Handley (1993a) used genetic pro$amming to attempt to predict a-helices in
globular proteins. Each program was executed once for each residue along a
protein sequence. Each program was required to predict whether or not the
current residue was part of an o-helix. The programs had access to the KyteDoolittle hydrophobicity values (Kyte and Doolittle 1982) and a measure of
the bulk of the current residue. Aturtle operation enabled an inspecting head
to wander to the left or right of the current residue and thereby obtain the
hydrophobicity andbulkvalues for neighboring residues. M*y other efforts
at prediction of features of proteins inspect residues in a window of fixed size
around the current residue. In this approach, the extent of inspection of neighboring residues was not specified in advance, but was, instead, evolved.
Handley (I99ad evolved a program for detecting whether or not a protein
segment is an u-helix. The evolved program achieved an out-of-sample correlation of 0.48 on this ve(sion of the secondary strucfure prediction problem.
F.2.5 The Donut Problem
Thckett and Carm i (199 4a) studied a classification problem involving two stochastic donuts interlocked like two links of a chain. The problem was to classify a given point in three-dimensional space as to the donut to which it
belonged. This classification problem is pathological for a number of reasons.
The mean of each probability distribution lies in the densest part of the other;
the distributions cannot be linearly separated by a percePtron rule; the
distributions cannot be covered by cones or hypercones; and they cannot be
enclosed by a pair of radial basis functions. Moreovel, class membership is
inherently ambiguous because outlying points of each probability distribution intermingle with points of the other.
699 Appendix F
The donut problem has the advantage that its difficulty can be scaled in
a controlled manner in several ways. Thckett and Carmi genetically evolved
classification programs for versions of the donut problem withdifferent
degrees of ambiguity of class membership, with different degrees of sparseness of data to test generalization, and with different "bites,, removed from
the donut.
Thckett and Carmi also compared the effect of different breeding policies,
comparing demes (spatially distributed local breeding groups) wthpanmictic
breeding (whereeachindividualisequallylikelytobreedwithanyotherequally
fit individual). They also compared the effects of the steady stite approach to
genetic algorithms (where the offspring produced by one appli.utiot of one
genetic operation are immediately available to participate in subsequent genetic operations) with the generationat approach(where tffspring produced by
a large number of genetic operations are held aside until an entire new population is ready to replace the entire old population).
F.2.6 Evolution of a Model for a |etliner
Nguyen and Huang (r99q evolved three-dimensional models
designed for use in an object recognition system employing an
fihress measure provided by the user.
E3 ROBOTIC CONTROL
F.3.L Crawling and walking of a six-Legged creature
for jetliners
interactive
Beer (1990) demonstrated that it was possible for a human to design a
neural network involving a surprisingly modest number of neurons to
enable a simulated cockroach to crawl and walk. Brooks (19g9) demonstrated that a human could design a controller written in the style of the
subsumption architecture to enable a similar six-legged artificiaicreature
to perform similar tasks.
Spencer (1993,1994) used genetic prograrnming to automatically generate
a Program that enables a six-legged creature to ciawl and walk. Thi"u progressively more difficult versions of the problem were solved. The peiformelnce of theProgramswas analyzed using the gait of the walk of the robotin
terms of leg-draggin& balancing, and fonrrard motion.
Spencer introduces a new constant perturbation operation which perturbs
random constants by a small, bounded., random percentage during a run of
genetic programming.
F.3,2 Evolution of Herding Behavior
Craig Reynolds, developer of the famous Boidsvideo (Reynold sr99\),stud_
ied the question of whether coordinated group motion could evolve eunong a
population of critters using genetic pro$amming @eynolds 1993). A simu_
700 Appendix F
I lated fwo-dimensional environment contained critters, static obstacles, and a
predator. In order to survive, the critters had to steer a safe course through a
dynamic environment and avoid collisions with obstacles and each other.
The predator preferentially targeted stragglers, thus encouraging aggrega'
tion and herding behavior.
83.3 Obstacle-AvoidingBehavior
Reynolds $99a{ showed how noise canbe used to promote robust solutions
to the problem of obstacle-avoiding behavior for a robot. Reynolds (1994b)
presented a vision-based model of obstacle-avoiding behavior for a robot.
F.3.4 Conidor-Following and the Lens Effect
In 'The Difficulty of Roving Eyes," Reynolds $99a0 considered three versions of a problem calling for the discovery of a controller for a corridorfollowing robot. The robothad a roving sensor, an arbitrary static sensor, and
a predetermined static sensor in the three versions. The histograms of fitness
in the initial random'generation were distinctly different for the three versions. These differences foreshadowed the difficulty of solving the problem
in the actual full runs of genetic programming and validate the existence of
the lens effect (chapter 26) for another problem domain.
F.3.5 Control of Autonomous Robots
Ghanea-Hercock and Frase r$99 D discussed the evolution of behavior-based
controllers for autonomous robot agents. Complex emergent behavior can
arise as a result of the interactions among low-level behaviors. As agents
attempt more complex problems, the number of interactions can increase
beyond the capacity of manual design. Ghanea-Hercock and Fraser used evolution to automate the process of designing controllers.
F.3.6 Evolution of Co-Operation among Autonomous Robots
Complex tasks can be performed either by a single very sophisticated device
orby a distributed collection of co-operating simpler devices. Rush, Fraser,
and Bames (1994) discussed how to automate the design of a control architecture for complex tasks. Th"y evolved a solution to a co-operative object relocation task that previously had been designed manually with a behavior
synthesis architecture.
F.9.7 Incorporating Domain Knowledge into Evolution
Fraser and Rush (1994) discussed ways of evolving artificial nervous systems
using the genetic algorithm and genetic programming. The aim was to produce control systems for multiple autonomous devices called BIRos (BiologiAppendix F
cally krspired Robots), without explicit design. Th.y discussed the ways
in which intelligent knowledge (INK) of the problem domain, as seen by
the designer could be incorporated into the evolutionary mechanism. A
co-operative relocation task previously designed using manual methods
was used.
83.8 Monitoring Strategy for Independent Agents
lrdependent agents, such as robots, need to acquire information about their
environment in order to perform their assigned tasks. Atkin and Cohen (1993a,
I993b,1994) applied genetic progamming to enable an independent agent to
leam a monitoring strategy for monitoring its environment.
83.9 Genetic Planner for Robots
Planning is the creation of computer programs that will be executed in the
future to control an independent agent, such as a robot. Handley (Igggb,I993c,
I994a) successfully applied genetic progranuning with automatically defined
functions to the creation of plans for the task of pushing three boxes together
and moving the robot to a specified location in another room. Hanatey
achieved an efficiency ratio, RE, of6.0 for performing the task of moving the
robot to a specified location in another room.
F.3.10 AI Planning Systems
Spector (1994) described a series of illustrative experiments in which
genetic programming was applied to traditional blocks-world planning
problems from the field of artificial intelligence. Spector discussed gurr"ti.
planning in the context of traditional artificial intelligence planning systems and commented on the costs and benefits to b"
"rp".ied from further work.
F.4 NEURAL NETWORKS
F.4.1 Cellular Encoding of Neural Networks
There are, of course, nurnerous effective algorithms for training a neural network to solve a problem. Back propagation (Rumelhart, Hinion, and will_
iams 1986) is the most widely used such algorithm.
Neural networks are complex structures that can be represented by linelabeled, point{abeled, directed graphs. The points may be input points, output points, or neural processing units within the network. The lines are labeled
with weights to represent the weighted connections between two points. The
neural processing units are labeled with numbers indicating the threshold
and bias of the unit.
The conventional genetic algorithm operating on fixed-length character
strings has been used to discover the weights for neural nets OAiU"r Todd,
702 Appendix F
and Hegde 1.989; Belew, Mclnerney, and Schraudolph 1997; Whitley,
Starkweather and Bogart 1990;Wilson 1990). Typically, the weights (and perhaps also the thresholds and biases) in the network are concatenated into a
long string (chromosome) of bits (or sometimes floating-point numbers); the
genetic algorithm then operates on this linear structure in the usual way. Superficially, the conventional genetic algorithmprovides an attractive approach
for searching the highly nonlinear multidimensional search space of weight
vectors. The genetic algorithm seems especially appropriate when recurrent
neural networks (i.e., non-feed-forward networks with memory and state)
are involved (Jefferson et al. 1991) because of the scarcity of methods for discovering the weights of a recurrent neural network.
Simultaneous discovery of both the architecture and weights of a neural
network has also been attempted using genetic progamming (Genetic Progr amming, section 19 .9).
However, the continuing difficulty in applying genetic methods for
designing neural networks has centered on the problem of findi^g u manipulable representation for the line-labeled, point-labeled, directed graph representing the neural network that is crossover-friendly and congenial to the
neural net problem domain.
Gruau (L992a, 1992b, 1993a, 1993b, I994a, I994b) and Gruau and \A/hitley
(1993a,1993b) dealt with this difficulfy. Instead of applyrng genetic methods
to entities that attempt to directly representhe identifiable parts of the neural
network, Gruau's clever and innovative cellular encoding technique applied
genetic programming to program trees that specifyhow the neuralnetwas to
be consfructed.
In Cruau's scheme, each individual (program tree) in the genetic population is a composition of network-constructing, neuron-creating, and neuronadjusting functions and terminals. Each of Gruau's program trees in the
population is one step removed from the actual neural network. The program tree is the genotype and the neural network constructed in accordance
with the tree's instructions is the phenotype. The fihress of an individual program tree in the population is measured in terms of how well the neural network that is constructed in accordance with the instructions contained in the
program tree performs the desired task. Genetic programming then breeds
the population of program trees in the usual manner.
The construction process for a neural network starts from an embryonic
neural network consisting of a single neuron. This embryonic neuron has a
threshold of 0; its input is connected to all of the network's input nodes with
connections with weights of +1,; its output is connected to all of the network's
ouput nodes.
The network-constructing functions in the program tree then specify how
to grow the single embryonic neuron into the full neural network. Certain
network-constructing functions permit a particular neuron to be subdivided
in a parallel or sequential manner. Other neuron-adjusting functions can
change the threshold of a neuron, the weight of a connection, or the bias on a
neuron. Apointer links the current operation in the program tree to a current
Appendix F
Point in the developing neural network so as to give specificity to the current
operation.
In additiory Gruau extends his basic scheme to permit recursions whictu in
tum, permit neural networks to be generated for high-order parity, syrnmetry', and other functions.
F.4.2 Synthesis of Sigma,pi Neural Networks
Zhang and Muhlenbein (7994) described the breeder genetic programming
method incolporatingparsimony (Occam's razor) in its fitress *"uirru. fh"y
applied this method to the synthesis of sigma-pi neural networks which contain multiplicative processing elements in addition to the usual additive processing elements.
H,4.3 New Learning Rules for Neural Networks
Leaming mechanisms for neural networks adjust the synaptic weights of a
neural network according to some rule. Bengio, Bengio, and Cloutier (Lgg4)
used genetic programming to discover the form as well as the numerical parameters for such rules. Their experiments involving 20 two-dimensional
classification problems (half linearly separable problems) suggested that genetic Programming found a better leaming rule for the particular problems
tested than simulated arurealing, the conventional genetic algorithm, or
backpropagation. The genetically wolved leaming rule bore some resemblance
to backpropagation. The evolved rule generalized to the seven-input LED
identification task.
F.5 INDUCTION AND REGRESSION
85.1 Induction of Regular Languages
Dunay, Petry', and Buckles (1994) considered the problem of discovering a
regular language from examples of sentences known to be in an unknown
language and sentences known not to be in that language. Thuy proceeded
by translating deterministic finite automata to binary trees and binarv trees
to S-expressions.
85.2 Levenberg-Marquardt Regression
Iiang (1992, 1993) and Jiang and Wright (1992) described a system for symbolic regression that combined the Levenberg-Marquardt regression algorithm
with genetic programming.
704 Appendix F
F.5.3 Multiple Steady States of a Dynamical System
Lay (1994) used genetic prograrnming to analyzethe multiple steady states of
a dynamical system for a continuously stirred tank reactor.
85.4 Inverting and Co-Evolving Randomizers
|annink (1994) attempted to unravel the structure of several random number
generators by using co-evolution in which their previous outputs were used
to predict their future outputs. He also co-evolved populations of randomizing programs to play a game similar to the permy matching game.
F.5.5 Adaptive Learning using Structured Genetic Algorithms
Hitoshi Iba and his colleagues at the Electrotechnical Laboratory in fapan
have published a number of papers on structured genetic algorithms and
genetic programming.
Iba and Sato (1992) discussed meta-level strategy learning for structured
genetic algorithms.Iba, de Garis, and Higuchi (1993) described the adaptive
learning of structured classifiers for foraging using structured genetic
algorithms.
85.5 Minimum Description Length and Group Method of
Data Handling
In addition to the work described above, Iba and his colleagues at the
Electrotechnical Laboratory in Japan have published three papers on solving
system identification (symbolic regression) problems.
Iba, Kurita, de Garis, and Sato (1993) introduced STROGANOFF (Structured Representation On Genetic Algorithms for Non-linear Function Fitting)
for solving system identification (symbolic regression) problems. Fibress was
measured using the minimum description length $/tDL) principle. The Group
Method of Data Handling (GMDH), developed by Ivakhnenko (197I), was
used as a basis for their technique. Genetic progamming was used to efficiently explore the space of possible GMDH solutions. This is an interesting
example of genetic pro$amming being used to tie together several existing
powerful techniques.
This approach was applied to the Mackey-Glass equations and a pattern
recognition problem.
Iba, deGaris, and Sato (1994) also applied the minimum description length
(MDL) principle to the problem of finding a decision tree for the Boolean
multiplexer function. The results were again compared with the results produced by GMDH.
Further work involving minimurn description length was reported in lba,
Sato, and de Garis (1994) and lba and Sato (1994).
705 Appendix F
F.5.7 Sequencelnduction
Jones (I99I) described the induction of mathematical formulae representing
observed sequences produced by single-parameter numeric functions. His
Program had a feature permitting an additional example (a pair of values of
the independent variable and dependent variable) to be presented after it
had evolved a formula. His program would then either confirm that the new
example was consistent with the evolved program or would restart the evolutionary process using the enlarged set of examples.
F.6 FINANCIAL
85.1 Horse Race Prediction
Programs for making predictions in the real world typically have an enormous number of inputs. Perry $994) described work on the prediction of
horse races using genetic programming. Evolution of such predicting programs was facilitated by enriching the population with individuals bred
off-line in preliminary runs.
F.6.2 Double Auction Market Strategies
Since 1990, the Santa Fe Institute has mn a double auction toumament using
a mechanism similar to that used in the minute-by-minute trading of commodity and futures exchanges. The participants in this market are strategies
embodied in computer programs written and submitted by economists, mathematicians, and computer scientists from around the world. Lr addition, humanplayers have been competing against automated players over the hrtemet
on the Arizona Token Exchange. Andrews and Prager (1994) used genetic
programming to create strategies for such double auction toumaments. The
strategies were compared to those created by simulated annealing.
86.3 C++Implementation
Andrew Singeton of Creafion Mechanics Inc. in Dublin, New Hampshire
is applying genetic programming to financial analysis. Singeton (1994)
described his GPQUICK implementation of genetic programmi^g in C++.
F.7 ART
F.7.1, Interactive Evolution of Equations of Images
Karl Sims, developer of the famous Pansperminvideo (Sims l991b),has shown
that a spectacular variety of color images can be produced by selecting images from a large number of randomly created and mutated programs displayed by an interactive workstation (Sims 1991a,1992a,1992b,1993a,1993b).
706 Appendix F
In this approach employing interactive fifiress, the human evaluates the current images and interactively selects the preferred image.
Sims (1993b) interactive method has recently been displayed in the Georges
Pompidou museum in Paris. Sims interactively produced the genetic art that
appears on the cover of Genetic Programming and this book.
1..7.1.1 Genetic Art in Virtual Reality
Das et al. (199Q extend Sims'work by evolving genetic art which the viewer
can walk around, examine, and manipulate using virfual reality.
F.7.2 lazzMelodies from Case-Based Reasoning and Genetic Programming
Spector and Alpern (1994) apply case-based reasoning and genetic programming in a system that produces new bebop jazz melodies from a casebase of melodies. Genetic programming was driven by user-Provided
evaluation.
F.8 DATABASES
E8.L News Story Classification by Dow fones
Editors at Dow Jones must assign one or more of about 350 codes daily to
thousands of news stories originating from newspapers, magazittes, news
wfues, and press releases. Brij Massand (1994) of Thinking Machines Corporation used the massively parallel Connection Machine to implement a
memory-based reasoning (MBR) system for encoding news stories. Genetic
programming was used to evolve a program which predicted the classification accuracy of the memory-based reasoning approach.
F.8.2 Building Queries for Information Retrieval
Kraft et al. (1994) viewed Boolean queries for information retrieval as a parse
tree and used genetic prografirming to improve the formulation of Boolean
queries by means of relevance feedback.
E9 ALGORITHMS
Eg.L Evolution of the Schedule for Simulated Annealing
Simulated annealing (Kirkpahick, Gelatt, and Vecchi L983; Aarts and Korst
1989;van Laarhoven andAarts 1987) is a probabilistic optimization technique
that is often applied to highly nonlinear multidimensional search spaces. Simulated annealing attempts to find the global optimum for the energy level (fitness) among all the points of the search space.
707 Appendix F
Simulated annealing operates over a series of discrete time steps (generations). The process is controlled by an annealing schedule which changes the
temperature Parameter,T, in a specified way as a function of the time step.
Simulated annealing starts with a single initial user-defined domain-specific structure. The energy (a zero-based measure comparable to standardized fitress) is measured for the current strucfure.
There is a user-defined probabilistic method for modifying (mutating) the
current structure. At each step of the process, a modification is probabilistically
created from the existing structure and the energy level of the single new
strucfure is determined.
The Metropolis algorithm is used to select between the new modified structure and the old structure. One of the two will be retained for the next time
step. If the energy level of the modification is an improvement, the modification is always greedily accepted. Howeveq, if the energy level of the modification is not an improvement, the modification may still be accepted with a
certain probability determined by the Boltzmann equation. This probability
of acceptance is greater if the energy difference is small and the probability of
acceptance is greater if the temperature paramete1T,is high. Simulated annealing differs from hillclimbing in that the observed better altemative is not
always adopted as the next point in the search space.
The temperature specified by the annealing schedule plays a very important role in the Process. If the annealing schedule is monotonically decreasing, a non-improving modification will be less likely to be accepted in a later
generation of the process.
A monotonically decreasing annealing schedule is conventionally used in
applying simulated annealing to specific problems. This practice appears to
be a consequence of the factthat an exponentially decreasing arurealing schedule is used in the mathematical proof of an important existence theorem in
the field of simulated annealing. Howeve{, the theorem involved does not
address the question of whether a monotonically decreasing annealing schedule is either best or required for a practical problem. In spite of almost universal conventional practice of using a monotonically decreasing annealing
schedules, the nature of the optimal annealing schedule is, in fac! an open
question. There is no mathematical justification for requiring the use of a
monotonically decreasing armealing schedule for a practical problem.
Thonemann (1992,1994) applied genetic programming to finding an optimal annealing sdredule for controlling runs of simulated annealing for benchmark examples of the quadratic assignment problem (eAp). Thonemann
found that a variety of evolved oscillatory annealing schedules are superior
to the usual monotonically decreasing annealing schedule.
F.9.2 Sorting Programs
Kinnear (1993a,1993b) used genetic pro$amming to successfully evolve general iterative sorting algorithms employing various sets of primitive functions. He also explored the differences in difficulty created by the use of
708 Appendix F
different primitive functions.
O'Reilly and Oppacher (1992) applied genetic programming to the task of
evolving generalized sorting algorithms and explored the difficulty of this
task in some detail.
Ry* (1994) used the problem of evolving a minimal sorting network to
show the advantages of disassortative mating in reducing premature convergence in genetic algorithms and genetic programming.
F.1O NATURAL LANGUAGE
F.10.1 Word Sense Disambiguation
Sieget (lgg4) used genetic programming to induce decision hees that determine the meaningof aword by looking at the context in which it is used
(word sense disambiguation). He developed a method for evolving decision
trees that had sets of values on each arc; the sets were represented using bit
strings; and bit-string crossover was intermingled with the subtree-swapping
of genetic Programming.
Siegel showed that genetic programming benefited from the competitive
co-evolution of training data, as first developed by Hillis (1990, 1991).In particular, he developed a method by which a fixed set* of training examples
(collected empirically) could competitively co-adapt against the decision trees.
F.L0.2 Classification of Swedish Words
Nordin (1gg4) developed an extremely fast version of genetic Pro$amming
in which the programs were comPosed of low'level binary machine code' He
applied it to the problem of classifying spelled-out Swedish words as nouns
or Pronouns.
F.11 MODULES
F.11.1 Module Acquisition and the Genetic Library Builder
Angeline and Pollack(1992,1994)have developed a tree compression operation that begins by choosi^g u point in a program tree and identifying the
portion of the tree lying within a specified distance below the chosen point- If
ull bturr.hes of the portion of the program tree thus identified terminate
with a terminal within the specified distance, the portion is defined as a
newly acquired module taking no arguments, and the portion is replaced
by u zero-arglrment call to the newly acquired module. This process is
identical to the encapsulation operation (Genetic Programming, subsection
6.5.4).In the more interesting case, if there are terminals or subtrees "hanging out" below the portion of the program tree thus identified that lie
oulside the specified distance, the portion is defined as a newly acquired
module taking as many arguments as there are terminals or subtrees be7W Appendix F
low the portion of the program tree thus identified, and the portion is
replaced by aparameterized call to the newly acquired module. ihe newly
acquired modules are collected by a genetic library builder (GLib). This
module acquisition (MA) operation provides a means to create subroutines with arguments that are defined dynamically during a run of genetic programming. Angeline and pollack then uppti"a gurruti.
programming to tasks such as Tic-Tac-Toe.
Angeline and Pollack (1993a
, tggSb) explored the advantages of competitive fitness measures for handling complex tasks. Co-evolution can be
implemented where each individual in the population competes with every other individual, where there is bipartite competition between pairs
of individuals, and where there is a multi-level tournament in which the
winner of the competition between pairs of individuals at one level of the
tournament competes with other winners at the next higher level of the
tournament.
Angeline (1'994a,1994b) explored how the concept of emergent intelliSence could be implemented using a number of evolutionary llgorithms,
including genetic algorithms, genetic programming, evolutitn strategies,
and evolutionary programming. KnowleJge-baseJ symbolic artificial in- telligence relies on internal representations of the task environment. The fact that these internal representations are inside the independent agents leads to the well-known problems of AI, including brittlenJss, lea.nability,
knowledge acquisition, memory indexing, and. credit allocation. These problems may be reduced or eliminated if the agent is allowed to interact
directly with its task environment. In what Angeline cans ,,emergent
intelligence," task-specific knowledge emerges from the interaction of the agent and the task environment.
Angeline (199aQ is an overview of genetic prograrnming that describes genetic programming's flexibility to tailor the representation language to the problem being solved, and how its specially designed crossor", o!"rutor pro_ vides a robust tool for evolving problem solutions. This paper provides an introduction to genetic Prograrnming, a short review of dyramic representa- tions used in other evolutionary systems and their relation to genetic programming, and a description of some of genetic programming,s
inherent properties.
F.11.2 Modules and Automatica[y Defined Functions
Kinnear (I994b) compared automatically defined functions (such as are de- scribed in this book) with module acquisition (described in subsection F.11.1) using the even4parity problem. He explored why automatically defined func- tions yielded significant speedup for this problem while the approach using module acquisition did not. Kinnear determined that the speedup was due to
a particular form of structural regularity in even-parity problems that was exploited by automatically defined functions, but was not exploited by mod- ule acquisition. Kinnear invented a novel crossover operator, called modular
710 Appendix F
crossove4 that provided much of the speedup provided by automafically
defined functions on the even-4-parity problem without the use of automatically defined functions.
8L1.3 Learning by Adapting Representations
Rosca and Ballard (1994a) demonstrated how genetic progranuning could
take advantage of its own search traces and thereby discover useful genetic
material to accelerate the search process. The newly discovered genetic material could be used to restructure the search space so that solutions could be
more easily found.
Rosca and Ballafi (1994b) discussed constructive induction, minimum descriptionlength, and leaming, Their approachto automatic discovery of functions in genetic programming was based on the discovery of useful building
blocks by malyzing the evolution trace, generalizing blocks to define new
functions, ffid finally adapting the problem rePresentation on-the-fly. Adaptation of the representation determined a hierarchical organization of the extended function set which enabled a restructuring of the search space so that
solutions could be found more easily. Measures of complexity of solution
trees were defined for an adaptive representation framework (e.9., strucfural,
evaluational, descriptional and expanded structural complexifl. Th" minimurn description length principle was applied to justify the feasibility of approaches based on a hierarchy of discovered functions and to suggest
alternative ways of defining a problem's fifrress function.
F.12 PROGRAMMINGMETHODS
F.lz.l. Directed Acyclic Graphs for Representing Populations of
Programs
Handley (1994b) presented a technique that reduced the time and space requirements for representing the programs in the population. The population
of parse trees was stored as a directed acyclic graph (DAG), rather than as a
forest of trees. Space was saved by not duplicating the storage of structurally
identicalsubtrees. Timewas also savedbecausethe contributiontowardfitress
of each subtree could be cached both within a generation and between
generations.
F.12.2 Co-Routine Execution Model
Maxwell (1994) expanded the methodology of genetic programming with a
co-routine model for the synchronous, parallel execution of the individual
programs in the population. Maxwell's approach allowed the removal of arbitrary time-out limits on execution time for problems that permit monitoring of progress (change in fitness) during the execution of programs working
toward a solution.
711 Appendix F
8L2.3 Stack-Based Virtual Machine
Perkis (1994) described a new and more efficient implementation of genetic
programming using a stack-based virfual machine.
81,3 VARIATIONS IN GENETIC OPERATIONS
8L3.1 Context-Preserving Crossover
D'haeseleer (1994) described two versions of a new context-preserving crossover operation and tested their performance on four problems: the obstacleavoiding robot problem (chapter 13), the Boolean 11-multiplexe{, the central
place food-foraging problem, and an iterated version of the obstacle-avoiding robot problem. He found that a mix of strong context-preserving crossoverwith ordinarycrossover was superior to ordinarycrossover alone in this
testbed of problems.
F.13.2 Brood Selection and Soft Selection
In nature it is common for organisms to produce many offspring and then
neglect or eat some of them or allow them to eat each other. This brood selection (soft selection) reduces the parent's investment of resources in offspring
that are potentially less fit than others. Thckett and Carmi (Lgg4b) showeJ
that brood selection could benefit genetic programming by conserving computer time and memory during runs. Altenberg (1994) argues that brood selectionbenefits the evolvability under recombination in genetic programming.
813.3 Implementation in C++
Keith and Martin (L994) discussed how to maximize efficiency and flexibility
in an implementation of genetic programming in C++.
F.13.4 Effect of Locality
D'haeseleer and Bluming (1994) demonstrated the beneficial effect of isolation, based on distance, on the performance of genetic programming on a
game involving simulated robot tanks. In other work, the structure of the
demes (isolated group of individuals in the population) was predetermined
in advance, whereas the demes spontaneously emerged in the work of
D'haeseleer and Bluming.
8L3.5 Biologically Motivated Representation of programs
Banzhaf (1993) described a method of representing programs employing binary strings and a set of biologically motivated operations (transcriptiory repaiq, editing, and linking) for such strings.
Appendix F
F.13.6 Niches
Abbott (1991) evolved partial solutions to a problem. These partial solutions
defined niches which.iUa then be combined to yield complete solutions to
the overall problem. This approach yielded significant speedup'
F.lg.7 Recombination and Selection
Thckett (lgg4) studied the effects of recombination and selection in genetic
programming, performed significant testing of genetic programming in the
corrie*t of induction, and introduced several new methods and operators'
Thckett characterized genetic programming as a search of the sPace of computer proglams and coiparedlt to altemative methods such as Tierra and
FOIL. Tackett compared genetic programming to other methods of machine
leaming using a problem of image discrimination'
Mun| natural organisms overyroduce zygotes and subsequent$ cull the
offspring at some hler stage of development. This brood selection is done in
order to reduce parental iurotr.u investment in inferior offspring' Thckett
introduced a brood recombination operator which was Parameterized by the
brood size and a brood culling function. Thckett characterized the computational investrnent of CPU and memory resources in terms of brood size, brood
fitress evaluation cost, and the fitness evaluation cost for full-fledged PoPulation members. Thckett showed that the brood recombination operator performs a greedy search of potential recombination sites (as opposed to the
random search of recombination sites performed in standard genetic programming). Subsequent tests of the brood recombination operator demonJt ut"d thit by using smaller population sizes with large broods, equivalent
or improved performance could be achieved using the brood recombination
op"ruior, while reducing the CPU and memory requirements relative to genetic progamming with standard (random) recombination.
Tackett also presented a new class of constructional problems inwhich fitness was based strictly on the syntactic form of expressions rather than semantic evaluation: a certain target expression was assigned perfect fibress
whilethose subexpressions resulting fromitshierarchical decompositionhad
intermediate fihe;s values. This problem allowed precise control over the
structure of the search space thereby providing a mechanism with which to
test the search properties of operators. Four problems were constructed, analogous to the Royal Road and deceptive problems previously applied to binary-string genetic algorithms. Greedy and random recombination methods
were tested in combination with several selection methods.
Acriticism of connectionist leamingby the symbolicAl community is that
neural methods are opaque with respect to providing insight into what they
have learned about a problem. Machine leaming has successfully produced
altemative systems which can leam parsimonious symbolic rules of induction through hill climbing. Other work has shown how such leaming may be
Appendix F
readily integrated with preexisting expert knowledge. Genetic programming
is likewise a symbolic method of induction, and ,o hu, potential tJfeed ,y-:
bolic knowledge about what it has leamed back into the user environment.
Among the potential advantages are that the genetic search may be more
powerful than other methods applied in symbolic leaming to date. It has
been observed that genetically induced programs do not nJa readily to inspection for many problems. Thckett introduced a "genebankers algorithrn,,
that hashes all expressions and subexpressions ltraits; occuring in the population in a time linearly proportional to the number of functions and terminals in the population. A variety of statstics including conditional (schema)
fifiress of each trait was computed and tracked over time. After a run completed, the collection of traits could be mined in order to try and determine
which traits and relationships were salient. For the p.trpor" of this simple
experiment, traits were primarily extracted by sorting on conditional fitness
and on frequency of occurrence. It is demonstrated that for simple problems
the extraction of salient expressions was readily achievablu, ,"iril" for more
difficult induction problems it was problematic. Hitchhiking (which Thckett
defines as the artificial inflation of fitness estimates for useless expressions
which embed themselves among salient expressions) was shown to Ue a primary confounding factor in this analysis. Thckett concluded by discussing
how more advanced methods of analysis could be applied to the mining ol
genetic traits,
F.13.8 Shongly Typed Genetic programming
Montana (1993) addressed the requirement of genetic programming that all
the variables, constants, argurnents for functions, and values retumed from
functions must be of the same data type. Montana dealt with the difficulties
imposed by the closure requirement by introducing a variation of genetic
programming called strongly typed genetic pro$amming (STGp). In STGP,
variables, constants, arguments, and refurned values can be of any data type
with the provision that the data type for each such value be specified beforehand. Consequently, the initializationprocess and the genetic operators only generate syntactically correct parse trees. Generic functions and
generic data types are key concepts for STGP. Generic functions are not
true strongly typed functions, but rather are templates for classes of such
functions. Generic data types are analogous. To illustrate STGP, Montana
presented four examples involving vector and matrix manipulation and
list manipulation. The first was a multi-dimensional least-squares regression problem; the second was a multi-dimensional Kalman filter problem;
the third was the list manipulation function NTH; and the fourth was the
list manipulation function MAPCAR.
Appendix F
F.1.4 MEMORY STATE, AND MENTAL MODELS
F.t4.t Evolution of Indexed Memory
An independent agent can effectively perform some simple tasks merely by
reactinglo information provided by its sensors about the current state of its
world. However, complex tasks generally require the acquisition, storage,
and retrieval of information in addition to the mere processing of information. The question arises as to whether genetic programming can evolve Programs thai use state (memory) in addition to sensory inputs in order to solve
problems.
Teller (1993,1994a) constructed a task that could not possibly be successfu1ly performed without the use of state. In this task the independent agent
had to push several boxes to the edges of a grid. Teller gave the agent access
to an indexed memory capable of storing 20 numbers. Teller used a two-argument WRITE operator for writing a particular value into a designated
memory cell and a one-argument READ operator for reading the value stored
in a designated memory cell. Thus, both WRITE and REan accessed memory
in an indexed way.
By writing and reading information to memory the genetically evolved
program created a mental model of the environment. The genetically evolved
mental model was not the kind of iconic or sentential model envisaged by
practitioners of symbolic artificial intelligence. The sequence of information
processing steps performed in the memory was not readily comprehensible.
However, the usefulness of the genetically evolved mental model was demonstrated by the otherwise unattainable high scores achieved in performing
the task. \Atrhen Teller introduced lesions to certain parts of the memory, the
agent's success in performing the task was degraded. Lesions to other parts
of the memory had no effect.
Teller compared solutions to his box-pushing problem both with and without automatically defined functions and reported that automatically defined
functions improved the performance on this task.
F.1,4.2 Map-Making and Map-Using
Andre (1994b) described a task involving the search for buried gold. Each
program in the population had two branches, called the map-maker and the
map-user. The map-maker could examine the environment for buried gold
and could store information in an indexed memory; however, it had no tools
for digging up the gold. The map-user was incapable of sensing gold; but it
had access to the information stored inthe indexed memory and has the tools
to dig up gold. A mental model of the environment stored in an indexed
memory enabled the map-user to find the gold.
715 Appendix F
F.15 THEORETICALFOUNDATIONS
F.L5.1, Evolution of Evolvabilify
Altenberg (1994) has explored the notion of evolvability,by which he means
the ability of a population to produce variants that are fitter than any yet
existing.Altenbergused Price's theorem on covarirmceand selection tociarify
the relationship between the fihress measure, the representation scheme, and
the genetic operators in genetic programming. Altenb erg analyzed the relationship of evolvability to the observed proliferation of common blocks of
code within Programs evolved using genetic programming. This important
theoretical analysis points the way to several intriguing ways to improve the
power of a genetic programming system.
8L5.2 Fitness Landscapes and Difficulty
The concept of fibress landscape, introduced by the biologist Sewell Wright,
refers to the mapping from the genome of the population to their fihresses.
Kinnear (I99aQ compared various measures for the fibress landscapes for a
range of problems to the difficulty of the problems as perceived by genetic
programming.He found that the autocorrelation of the fihress values of the
result of random walks was only a weak indicator of the difficulty, and that
some measures determined from adaptive walks appeared to offer greater
predictive value.
EL5.3 Schema in Genetic Programming
o'Reilly and oppacher (1994) defined a schema, the order of a schema, and
the defining length of a schema and accounted for the variable tength and the
non-homologous nature of the representation in genetic programming. They
formulated a schema theorem for genetic progranuning. Their schema theorem/ in tum,leads to a testable hypothetical account of how geneticprogramming searches by hierarchically combining building blocks.
F.L5.4 TirringCompleteness
Teller (I994b,7994c) showed that when genetic programming is combined
with indexed memory (described in subsection F.74.1), the resulting system
is Turing complete.
776 Appendix F
Appendix G: Electronic Mailitg List and
Public Repository
Additional information on genetic program-i.g can be obtained from the
mailing list and the on-line public repository and FTP site described below.
G.l ELECTRONIC MAILING LIST
A mailing list on genetic programming has been established and is currently maintained byJames P. Rice of the Knowledge Systems Laboratory
of Stanford University. You may subscribe to this on-line mailing list, at
no charge,by sending a subscription request on the Internet consisting of
the message subscribe genetic-programming to genetic-programming-REQUEST@cs . stanf ord. edu by electronic mail.
G.2 PUBLIC REPOSITORY AND F'TP SITE
An on-line public repository and FTP (file transfer protocol) site containing
computer code, papers on genetic programming, and frequently asked questions has been established on the Intemet and is currently maintained by
James McCoy of the Computation Center at the University of Texas
at Austin.
This repository may be accessed on the Internet by anonymous FTP from
the site f tp. cc. utexas. edu and the pub/genetic-programming directory.
This FTP site contains
' the Common LISP computer code appearing in appendix E of this book for
implementing automatically defined functions,
' the original "Little LISP" computer code written in Common LISP for
genetic programming as contained in appendixes B and C of Genetic
Programming: On the Programmlng of Computersby Means of Natural Selection
(Koza 1992a),
' various computer implementations of for genetic programming written by
others in C, and C++, and other progamming languages,
vanous papers on genetic programming (often including some not-yet
published papers),
answers to frequently asked questions about genetic programming, and
back issues of the GP mailing list.
71,8 Appendix G
Bibliography
The symbol = indicates that the reference is discussed in the annotated bibliography of appendix F.
Aarts, E. and Korst, J. 1989. Simulated Annealing and Boltzmann Mnchines. Wiley.
=Abbott, R. J. 1991. Niches as a GA divide-and-conquer strategy. In Chapman, Art and Myers,
Leonard (editors). Proceedings ofthe Second Annual Al Symposium for the California State Uniuer'
sity. California State University.
Albrecht, R. F., Reeves, C. R., and Steele, N. C. 1993. Artificinl Neural Nets and Genetic Algorithms.
Springer-Verlag.
=Altenberg ,L.1994. The evolution of evolvability in genetic programming. [r Kirurear, K. E. Jr.
(editor). Adaances in Gmetic Programming. The MIT Press.
=Andre, D. 1994a. Automatically defined features: The simultaneous evolution of 2-dimensional
feature detectors and an algorithm for using them. In Kinnear, K. E. Jr. (editor). Adaances in
Genetic Programming. The MIT Press.
=Andre, D. 1994b. Evolution of map making: Leaming, planning, and memory using genetic
programming. Proceedings of the 1994 IEEE World Congress on Computational Intelligence.IEEE
Press.
=Andre, D. L994c. Leaming and upgrading rules for an OCR system using genetic programming. Proceedings ofthe 1994 IEEE World Congress on Computational Intelligence.IBEE Press.
=Andrews, M. and Prager,R.1994. Genetic programming for the acquisition of double auction
market strategies. I:r Kinnear, K. E. Jr. (editor). Adoances in Genetic Programming. The MIT Press.
AnfinseryC.B.lgT3.Principlesthatgovemthefoldingofproteinchains. ScienceSl:223-230.
=Angeline, P J. 1994a. Euolutionary Algorithms and Emergent Intelligence. Ph.D. dissertation. Computer Science Department. The Ohio State University.
=Angeline, P.J. L994b. Genetic programming and the emergence of intelligence. In Kinnear, K.
E. fr. (editor). Adaances in Gutetic Programming. The MIT Press.
=Angeline, P.J. 1994c. Genetic programming: A current snapshot. In Fogel, D. B. and Atmar, W
(editors). Proceedings ofthe Third Annual Conference on Eaolutionary Programming.Evolutionary
Pro grammin g Society I99 4.
=Angeline, P. ]. and Pollack, J. 8.1992. The evolutionary induction of subroutines. Proceedings of
the Fourteenth Annual Conference ofthe Cognitiae Science Society. Lawrence Earlbaum.
=Angeline, P..J. and Pollack, J. B. 1993a. Coeaolaing High-Leael Representations. Technical report
92-PA-COEVOWE. Laboratory forArtificial Intelligence. The Ohio State University. f:ly 1993.
=Angeline, P.I. and Pollack, J. B. 1993b. Competitive environments evolve better solutions for
complex tasks. tr Forrest, S. (editor). Proceedings of the Fifth lntemational Confuence on Gmetic
Algorithms. Morgan Kaufmann.
=Angeline, P. ]. and Pollack, ]. B.lgg4.Coevolving high-level representations. In Langton, C. G.
(editor). Artificial Ltfe III, SFI Studies in the Sciences of Complexlfy. Volume XVIL Addison-Wesley.
Argos, P. 1989. Predictions of protein structure from gene and amino acid sequences. In Creighton,
T. (editor). Protein Structure: A practical Approach.IRl press.
Arikawa, s., Kuhara, s., Miyano, s., shinohara, A., and shinohara, T.7g92.A leaming algorithm
for elementary formal systems and its experiments on identification of transmembrane domains. In Shriver, B. D. (editor). Proceedings ofthe Twenty-Fifth Hawaii International Conference on
system sciences 1992.The IEEE Computer society press. Volume I.
=Atkin, M. and Cohen, P. R. 1993a. Genetic prograrnming to leam an agent's monitoring strategy. Ptoceedings of the AAAI-93 Workshop on Learning Action ModeIs.AAAI press.
=Atkin, M. and Cohen, P. R. 1993b. Genetic programming to learn an agent's monitoring strategy.
Technical report TR-93-26, Computer Science Department, University of Massachusetts, Amherst.
=Atkin, M. and Cohen, P.R.1994. Leaming monitoring strategies: A difficult genetic programming applic atton. Proceedings ofthe 1,994 IEEE World Congress an Computational Intelligence. IEEE
Press.
Bairoctu A. and Boeckmarrr, B. 1991.. The S\MSS PROT protein sequence data bank. Nucleic
Acids Research 19: 2247-2249.
=Banzhaf, W 1993. Genetic programming for pedestrians. In Forrest, S. (editor). Proceedings of
the Fifih lnternational Confuence on Genetic Atgorithms.Morgan Kaufmann.
Barc, A., Cohen, P. R. and Feigenbaum, E. A. 1989. The Handbook of Artificinl Intelligence. Addison-
\A/esley. Volume [V.
Bauet R. J. 1994. Genetic Algorithms and Inaestment Strategies. Wiley.
Bell, G. I. and Mary T. G. (editors) .1990. Computers and DNA.Addison-wesley.
Beer, R. D. 1990. lntelligence as Adaptiae Behnaior: Experiments in Computational Neuroethology.
Academic Press.
Belew, R', and Booker, L. (editors). 1991. Proceedings ofthe Fourth International Conference on Genetic Algorithms. Morgan Kaufmann.
Belew, R., Mclnemey, J., and Schraudolph, N. N. 1991. Evolving networks: Using the genetic
algorithm with connectionist leaming. Lr Langton, Christopher, et al. (editors). Artificial Life II,
SFI Studies in the Sciences of Complexify. Volume X. Addison-Wesley.
=Bengio, S., Bengio, Y., and Clouti er,J.1994. Use of genetic programming for the search of a new
learning rule for neutral networks. Proceedings ofthe 1994IEEE World Congress on Computational
Intelligence. IEEE Press.
Bemstein, F. C., Koetzle, T. F., Williams, G.J.B., Meyer, E.I.,II., Brice, M. D., Rodgers, J.R.,
Kennard, O., Shimamouchi, T., and Thsumi, M.1977. The protein data bank A computer based
archival file for macromolecular structures.lournal of Molecular Biology.ll2:535-542.
Branden, C. and Tooze,J.1997.Introduction to Protein Structure. Garland Publishing.
Brooks, R. 1989. A robot that walks: Emergent behaviors from a carefully evolved network.
N eur al Cornput ation 1(2): 253-262.
Buckles B. P. and Petry', F.E.1992. Genetic Algorithms. The IEEE Computer Society Press.
Cantor, C., R. and Lim, H. A. (editors) .1991,. The First International Conference on Electrophoresis,
Supucomputing, and the Human Genome. World Scientific.
Bibliography
cedeno, w. and vemuri, v.lggg.An investigation of DNA mapping with genetic algorithms:
preliminary results. proceedings of the Fifthworkshap onNeuralNetusorks: An International Conference onComputational lntelligence: NeuralNetworks, Fuzzy systuns, Eaolutionary Programming, and
VirtualReatity. The Society for Computer Simulation'
Chamiak, E. and McDermott, D. 1985. Introduction to Artificinl lntelligence. Addison-Wesley.
Chou, p. y. and Fasman, G.D.1974a. Conformational parameters for amino acids in helical, bsheet, and random coil regions calculated from proteins. Biochemistry ' 13: 21'l'-222.
Chou, p. y. and Fasman, G.D.Ig74b. Prediction of proteinconformation.Biochnnistry.lS:222-
245.
Collins, R. and Jefferson, D.1gg1,. Representations for artificial organisms. In Meye1, J-A, and
Wilson, S.W. From Animals to Animnts: Proceedings oftheFirst International Conference on Simulation of Adaptiue Behaaior. The MIT Press.
n"i:n,"",
T. E. 1993. Proteins: Structures and Molecular Properties. Second Edition. W. H' Free-
=Das,S., Franguiadakis, T., Papka,M., DeFanti,T. A., andSandin, D.J.1994.4genetic Programming application in virtual reality. Proceedings ofthe 1,994IEEE World Congress on Computational
lnt elligmce. IEEE Press.
Davidor, Y. L991. Genetic Algorithms and Robotics. world scientific.
Davis, L. (editor). 1987. Genetic Algorithms snd Simulated Annealing. Pittman.
Davis, L.199I. Handbookof Gmetic Algorithms. van Nostrand Reinhold.
Delong, G. 1981. Generalization based on explanattons. Proceedings of the Sarcnth International
loint Conference on Artiflcial lntelligence. Morgan Kaufmann'
DeJong, G. 1983. Acquiring schemata through understanding and generalizing plans. Proceedings of the Eighth lnternational loint Conference on Artificial Intelligence. Morgan Kaufmann.
=D'haeseleer, P. and Bluming, !. 1994.Effects of locality in individual and population evolution.
In Kinnear, K. E. Jr. (editor). Adaances in Genetic Programming. The MIT Press.
=D'haeselee r, P. 1994. Context preserving crossover in genetic programmin g' Proceedings of the
1994IEEE World Congress on Computational Intelligence.IEBE Press.
Doolittle, R. F. (editor). 1990. Methods in Enzymology -Volume 1-83 - Molecular Eaolution: Computer Annlysis of Protein and Nucleic Acid Seque:nces' Academic Press.
Doolittle, R. F. 1987. Of Urft and Orfs: A Primer on How to Analyze Deriaed Amino Acid Sequences.
University Science Books.
=Dunay, B. D., Petry, F. E., and Buckles, W.P.1994. Regular language induction with genetic
programming. Prcceedings of the L994IEEE INorId Congress on Computational Intelligcnce.IEEE
Press.
Engelman, D., Steltz, T., and Goldman, A. 1986. Identifying nonpolar transbilayer helices in
amino acid sequences of membrane proteins. Annual Reoiew of Biophysics and Biophysiological
Chemistry . Annual Reviews. Volume L5.
Fasman, G. D. 1990. Prediction of Protein Structure and the Principles of Protein Conformation. Plenum Press.
Fickett, f. W. and Cinkosky, M. I. 1993. A genetic algorithm for assembling chromosome physical maps. In Lim, H.A., Fickett, J. W., Cantor, C. R., and Robbins, R. I. (editors). The Second
International Conference on Bioinformatics, Supercomputing, and Complex Genome Analysis. World
Scientific.
721 Bibliography
Fikes, R' E., Hart, P. E., and Nilsson, N. J. 1972. Learning and executing generalized robot plans.
Ar t ifi cial Int elli g en c e, 3 :ZSI-288.
Fisher, R. A. 1958. Statisticat Methods for Research Workers.l3th Edition. Hafner.
Fogef D'B' 7991'. System ldentifuation through Simulated Euolution.Ginn press.
Fogel, D. B. and Atmat, W. (editors). 1992. Proceedings ofthe First Annual Conference on Eaolutionary Progrnmming. Ev olufionary Programming Society.
Fogel, D. B. and Atmar, W. (editors) .1993. Proceedings of-the Second Annual Conference on Euolutionary Programming. Evolutionary programming Society.
Forrest, S. (editor). 1990. Emergent Computation: Self-Organizing, Collective, and Cooperative
Computing Networks. The MIT Press.
Forrest, s.1991. Parallelism and Programming in Classifier systems. pittman.
Forrest, S. (editor). 1993. Proceedings of the Fifth Internationnl Conference on Genetic Algorithms.
Morgan Kaufmann.
=Fraser, A. P' and Ruslu J. R. 1994. Putting INK into a BIRo:A discussion of problem domain
knowledge for evolutionary robotics. Proceedings of the Workshop on Artificial Intellingence and
simulation of Behaoiour workshop on Eaolutionary Computing, April 11-13, 1994.
Fukushima, K. and Miyake, 5.7982. Neocognitron: A new algorithm for pattem recognition
tolerant of deformations and shifts in position. Pattern Recognition,15(6):455-a69.
Fukushima, K., Miyake, S., and Thkatuki, I. 1983. IEEE Transactions on Svstems, Man, and Cubernetics. 13(5): 826-834.
Fukushima, K. l-989. Analysis of the process of visual pattem recognition by neocognitron. Nearal N etworks, 2: 413-420.
=Ghanea-Hercock, R. and Fraser, A. P. 1994. Evolution of autonomous robot control architecttxes. Proceedings ofthe Workshop on Artificial lntellingence and Simulstion of Behauiour Workshop on
Euolutionary Computing, April 11-13, 199 4.
Cierasch, L. M. and King, |. 1990. Protein Folding: Deciphering the Second HaIf of the Genetic Code.
American Association for the Advancement of Science.
Goldberg, D. E. 1989. Genetic Algorithms in Search, Optimization, and MachineLearning.AddisonWesley.
Goldberg, D. E. and Deb, K. 1991. Acomparative analysis of selecfion schemes used in genetic
algorithms. In Rawlins, G. (editor). Foundations of Genetic Algorithms. Morgan Kaufmann.
Grefenstette, J. J. (editor) .1985. Proceedings ofan International Conference on Genetic Algorithms and
T heir Ap p lic at ions .Erlb aurn.
Grefenstefte, I. J. (editor) .1987 . Genetic Algorithms and Their Applications: Proceedings ofthe Second
International Conference on Genetic Algorithms. Erlbaum.
=Gruau, F.1992a. Genetic synthesis of Boolean neural networks with a cell rewriting developmental Process. Lr Schaffer, J. D. and \tVhitley, D. (editors) . Proceedings ofthe Workshop on Combinations of Genetic Algorithms and Neural Networks 1992.The IEEE Computer Society Press.
=Gruau, F.1992b. Cellular encoding of GeneticNeuralNetworks.Technical reportg2-Z\. Laboratoire
de l'Informatique du Parall6lisme. Ecole Normale Sup6rieure de Lyon.
=Gruau, F. 1993a. Genetic synthesis of modular neural networks. In Forrest, S. (editor). Proceedings of the Fifth International Conference on Genetic Algorithms. Morgan Kaufmann.
=Gruau, F.1993b. Grammatical inference with genetic search using cellular encoding. ln Lucas,
Simon (editor). Proceedings of the International Conference on Grammatical Inference. The Institution of Electrical Engineers, London.
Bibliography
=Gruau, F.1994a. Neural Network Synthesis using Cellular Encoding and the Genetic Algorithm.PhD
thesis. Laboratoire de l'Lrformatique du Parall6lisme, Ecole Normale Supbrieure de Lyon.
=Gruau, F.1994b. Genetic micro programming of neural networks. In Kinnear, K. E. Jr. (editor).
Aduances in Genetic Programming. The MIT Press.
=Gruau, F and \tVhitley, D. \993a. The cellular dnelopment of neural networks: The interaction of
learning and euolutioin. Technical report %-A4. Laboratoire de I'Informatique du Parall6lisme,
Ecole Normale Supdrieure de Lyon.
=Gruau, F and Whitley, D.1993b. Adding learning to the cellular development process: a comparative study. Ezsolutionary Computation 1(3):213-233.
Hamaguchi ,K.1992. The Protein Molecule: Conformation, Stability, and Folding. Japan Scientific
Societies Press.
=Handley, S. 1993a. Automated leaming of a detector for g-helices in protein sequences via
genetic programming. In Forrest, S. (editor). Proceedings of the Fifth International Conference on
Genetic Algorithms. Morgan Kaufmann.
=Handley, S. 1993b. The genetic planner: The automatic generation of plans for a mobile robot
via genetic programming. Proceedings of the Eighth IEEE Internationnl Symposium on Intelligent
Control. The IEEE Control System Society.
=Handley, S. 1993c. The automatic generation of pians for a mobile robot via genetic Programming with automatically defined functions. Proceedings ofthe FifthWorkshop on Neural Networlcs:
An International Conference on Computational Intelligence: Neural Networks, Fuzzy Systems, Eztolutionnry Programming, andVirtual Reality. The Society for Computer Simulation.
=Handley, 5.1994a. The automatic generation of plans for a mobile robot via genetic Programming with automatically defined functions. In Kinneaq, K. E. Jr. (editor). Adaances in Genetic
Programming. The MIT Press.
=Handley, 5.1994b. On the use of a directed acyclic graph to represent a population of computer progr ams. Proceedings ofthe 1994 IEEE World Congress on Computational Intelligence.IEEE
Press.
=Handley, S. 1994c. Automated leaming of a detector for the cores of cr-helices in protein sequences via genetic programmin g. Proceedings ofthe 1.994 IEEE World Congress an Computational
Intelligence. IEEE Press.
Hibbert, D. B. 1993. Dsplay of chemical structures in two dimensions and the evolution of
molecular recognition. In Forrest, S. (editor). Proceedings of the Fifth International Confuence on
Genetic Algorithms. Morgan Kaufmann.
Hillis, W. D. 1990. Co-evolving parasites improve simulated evolution as an optimization procedure. In Forrest, S. (editor). Emergrnt Cornputation: Self-Organizing, Collectiae, and Cooperatiae
ComputingNetworks. The MIT Press.
Hillis, W. D. 1991. Co-evolving parasites improve simulated evolution as an optimization procedure. In Langton, Christopher, et al. (editors). Artificial Llfe ll, SFI Studies in the Sciences of
Complexity . Volume X. Addison-Wesley.
Hinton, G. 1989. Connectionist learning procedures. Artificinl lntelligence. 40:185-234.
Hoibrook, S. R., Muskal, S. M., and Kim, S. H. 1990. Predicting surface exposure of amino acids
from protein sequence. Protein Engineering. 3(8): 659-665.
Holland, I. H. 1975. Adaptation in Natural and Artificial Systems: An Introductory Analysis with
Applications to Biology, Control, and Artificial lntelligence. University of Michigan Press. Also second edition, The MIT Press 1992.
Bibliography
Holland, I. H' 1986. Escaping brittleness: The possibilities of general-purpose leaming algorithms applied to parallel rule-based systems. [r Michalski, Ryszard S. et al. (editors). Machine
Learning: An Artificinl lntelligence Approach, volumell Morgan Kaufmann.
Holland, J. H, Holyoak, K.J., Nisbett, R.E., and Thagard, P.A. 1986. lnduction: processes of Inference, l"earning, and Discoaery. The MIT press.
HoPP, T. P. and Woods, K. R. 1981. Proceedings ofthe National Academy of Sciences IISA. Tg: 3gZ4-
3828.
Hunter, L. (editor). 1993. Artificial lntelligence and Molecular Biology. AA AI press.
Hunter, L., Searls, D., and Shavlik, J. (editors). 1993. Proceedings ofthe First International Conference on Intelligent Systems for Molecular Biotogy.AAAI press.
=Iba,H', de Garis, H', and Higuchi ,T.lgg3.Evolutionary leaming of predatory behaviors based
on structured classifiers. In Meyeq, J. A., Roitblat, H. L. and Wilson, S. W. (editors). From Animals
to Animats 2: Proceedings ofthe Second Interrutional Conference on Simulation of Adaptiae Behnaior.
The MIT Press.
=Iba, H', deGaris, H', and Sato, T. 1993. Sotaing identification problems by structured genetic algorithms. Technical report ETL-TR-93- 17 . J apanElectrotechnical Laboratory.
=Iba, H., deGaris, H., and Sato, T. 1994. Genetic programming using a minimum description
length principle. In Kinneal, K. E. Jr. (editor). Aduances in Genetic Programming. The MIT press.
=Iba,H., Kurita, T., de Garis, H., and Sato, T. 1993. System identification using structured genetic algorithms. In Forrest, S. (editor). Proceedings of the Fifth International Conference on Genetic
Al gor ithms. Morgan Kaufmann.
=Iba, H' and Sato, T.Lggz-Meta-level strategy leaming for GAbased on structured representation' In Proceedings of the Second Pacific Rim International Conference on Artificial Intelligence. Center for Artificial Intelligence Research, Kaist.
=Iba, H. and Sato, T.1994' Extension of STROGANOFF for symbolic problems. Technical report
ETL-TR-94-1. Japan Electrotechnical Laboratory.
=Iba, H., Sato, T., and deGaris, H. 1994. System identification approach to genetic programming' Proceedings of the 1994 IEEE World Congress on Computational Intelligenri.rcnnpress.
Ioergel, T R', Rendell, L., and Subramania m, S. 1993. Constructive induction and protein tertiary structure prediction. hr Searls, D., and Shavlik, J. (editors). 1993. proceedings of the First
International Conftrence on Intelligent Systems for Molecular Biology.AJAJtI press.
Ishikawa, lt[-,Toya, T., Totoki, Y., and Konagaya, A.lggg.Parallel iterative aligner with genetic
algorithm. In Takagl T., Imai, H., Miyano, s. Mitaku, s., and Kanehisa, M. (editorsr. Genome
Informatics Workshop 17. Universal Academy press.
Ivakhnenko, A. G. 1977' Polynomial theory of complex systems. IEEE Tiansactions on Systems,
Machines, and Cybernetics. 1(4): 364-g7g.
=jannink, I. 7994. Cracking and co-evolving randomizers. In Kinneaq, K. E. Jr. (editor ). Aduances
in Genetic Programming. The MIT press.
Jefferson, D., Collins, R., Coope4, C,Dyer, M., Flowers, M., Korf, R., Thylor, C., and Wang, A.
1991. Evolution as a theme in artificial life: The genesys/tracker system. ln Langton" C., et al.
(editors). ArtificialLife Il, SFI Studies in the Sciences of Complexi{r. Volume X. Addison-Wesley.
=liang, M. 1992. A hierarchical genetic system for symbolic function identification. Master,s thesis.
University of Montana.
=Jiang, M. 1993. An adaptive function identification system. Proceedings ofthe IEEE/ACM Conference on Deaeloping and Managing Intelligent System Projects, Vienna, Virginia, March 1g93.
724 Bibliography
=Iran& M. and Wright, A. H.lggz.Ahierarchical genetic system for symbolic function identification. Proceedings ofthe 24th Symposium on the Intwt'ace: Computing Science and Statistics, College
Station, Tbxas, March L992.
=/on€sr A.799L. Writing Progrnms tlsing Genetic Algorithms. M.Sc. thesis, Department of Computer Science, University of Manchestel, United Kingdom.
|ones, G., Brown, R. D., Clark, D. E., Willett, P., and Glen, R. C.1993. Searching databases of twodimensional and three-dimensional chemical structures using genetic algorithms. br Forrest, S.
(editor). Proceedings ofthe Fifth International Confuence on Genetic Algorithms. Morgan Kaufmann.
Kabsch, W. and Sander, C. L983. Dictionary of protein secondary structure: Pattem recognition
of hydrogen-bonded and geomekical feature s. Biopolymers. 22: 2577-2637.
Keane, M. A., Koza,J. R., and Rice, J. P.1993. Finding an impulse response function using genetic programming. Proceedings of the 1993 American Control Conference. American Automatic
Control Council. Volume IIL
=Keittu M.J. and Martin, M. C. Genetic programming in C++: lmplementation issues. In Kinneaq,
K. E. ]r. (editor). Adaances in Gmetic Programming. The MIT Press.
Kendrew, J. C. 1958. A three-dimensional model of the myoglobin molecule obtained by x-ray
analysis. N ature. 181: 662-666.
=Kinneat K. E., Jr. 1993a. Evolving a sort Lessons in genetic progralnming.7.993IEEE lnternational Conference on Neural Networks, San Francisco. IEEE Press. Volume 2.
=Kinnear, K. E., |r. 1993b. Generality and difficulty in genetic programming: Evolving a sort. Lr
Forrest, S. (editor). Proceedings ofthe f ifth International Confermce on Genetic Algorithms. Morgan
Kaufmann.
=Kinnear, K. E. Jr. (editor). 1994a. Adaances in Genetic Programming. Cambridge: The MIT Press.
=Kinnear, K. E., Jr. lgg4b.Altematives in automatic function definition: Acomparison of performance. Irr Kinnear, K. E., ]r. (editor). Adaances in Gutetic Programming. Cambridge: The MIT
Press.
=Kinnear, K. E., ]r. lggk.Fitness landscapes and difficulty in genetic programming. Proceedings
of the 1.994IEEE World Congress on Computational Intelligence.IEEE Press.
Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. 1983. Optimization by simulated annealing.
Science220:671-480.
Konagaya, A. and Kondou, H. L993. Stochastic motif extraction using a genetic algorithm with
the MDLprinciple.InMudge, T. N., Milutinovic, V., and Hunter, L. (editors). Proceedings ofthe
Twenty-Sixth AnnualHawaiilnternationalConference on Systems Science 1993. The IEEE Computer
Society Press. Volume I.
Korf, R. E. 1980. Toward a model of representation changes. Artificial Intelligence, L4,4178.
Korf, R. E. 1985a. Macro-operators:Aweak method for leaming. ArtificialIntelligence,26,Ss-77.
Korf, R. E. 1985b. Depth-first iterative-deepening: An optimal admissible tree search. Artificial
lntelligence, 27, 97-1'1.0.
Koza,I.R.1972. On hrducing a Non-Tiivial, Parsimonious, Hierarchical Grammar for a Given
Sample of Sentences. Ph.D. dissertation, Department of Computer Science, University of Michigan.
Koza,I. R. 1988. Non-Linear Genetic Algorithms for Solaing Problems.U .5. Patent Application filed
May20,L988.
Koza, J. R. 1989. Hierarchical genetic algorithms operating on populations of computer programs. In Proceedings of the 1Lth lnternational loint Conference on Artificial lntelligence. Morgan
Kaufmann. Volume I.
725 Bibliography
Koza, J. R. 1990a. Genetic Programming: A Paradigm for Genetically Breeding populations of Computu Programs to Solue Problems. Stanford University Computer Science Department technical
report STAN-CS-90-1314.
Koza,J. R. 1990b. Agenetic approach to econometric modeling. Paper presented at Sixth World
Congress of the Econometric society, Barcelona, spain. August 22,1gg0.
Koza,J. R. 1990c. Genetically breeding populations of computer programs to solve problems in
artificial intelligence .InProceedings of the Second International Conference onTools for Al.The IEEE
Computer Society Press.
Koza,I. R. 1990d. Non-Linear Genetic Algorithms for Solaing Problems. Filed May 20, Iggg.IJ.S.
Patent 4,935,877.Issued ]une L9, 1990.
Koza,J. R. 1990e. Non-Linear Genetic Algorithmsfor Solaing Problemsby Finding aFit Composition
of Functions. U.S. Patent Application filed March 29,1990.
Koza, J. R' 1991a. Evolution and co-evolution of computer programs to control independentacting agents. In Meye1, J-A., and Wilsory S.tN. From Animals to Animats: Proceedings ofthe First
lnternational Conference on simulation of Adaptiae Behaaior. The MIT press.
Koza,I. R. 1991b. Concept formation and decision tree induction using the genetic programming paradigm. In Schwefel, H. P. and Maenner, R. (editors) . Parallel Problem SolaingfromNature. Spnnger-Verlag.
Koza,l. R' 1991c. Genetic evolution and co-evolution of computer programs. In Langton, ChristoPher. et al. (editors). ArtificialLifeII, SFI Studies inthe Sciences ofCornplexity. VolumeX. AddisonWesley.
Koza, J. R. 1991d' A hierarchical approach to leaming the Boolean multiplexer function. In
Rawlins, G. (editor). Foundations of Genetic Algorithms. Morgan Kaufmann.
Koza,J. R. 1991e. Evolving a computer program to generate random numbers using the genetic
programming paradigm. L:r Belew, R. and Booket L. (editors). Proceedings ofthe Fourth International Conference 0n Genetic Algorithms. Morgan Kaufmann.
Koza,I. R' 1991f. A genetic approach to econometric modeling. In Bourgine, p. and Walliset, B.
(editors). Economics and Cognitioe Science. pergamon.
Koza, J. R' 1992a. Genetic Programming: On the Programming of Computers by Means of Natural
Selection. The MIT Press.
Koza,J.R' 1992b. Hierarchical automatic function definition in genetic programming. hr Whitley,
D. (editor). Proceedings of the Workshop on the Foundations of Gmetic Algorithms and Classifur Systems, VaiI, Colorado 1992. Morgan Kaufmann.
Koza,l. R' 1992c. The genetic programming paradigm: Genetically breeding populations of
computer Programs to solve problems. In Soucek, B. and the IRIS Group (editors). Dynamic,
Genetic, and Ch"aotic Programming. Wiley,
Koza,J . R. 1992d' A genetic approach to finding a controller to back up a tractor-trailer truck. In
Proceedings ofthe 1992 American Control Conference.American Automatic Control Council.
Koza, J. R. 1992e. A genetic approach to the truck backer upper problem and the inter-twined
spirals problem. InProceedings of International loint Conference onNeuralNetworks,Baltimore,lune
1992.IEEE Press.
Koza,I. R. 1992f .Evolution of subsumption using genetic programming. In Varela, F. J., and
Bourgine, P' (editors). Toward a Practice of Autonomous Systems: Proceedings ofthe First European
Conference on Artificial Life.The MIT press.
Koza, J . R' 19929. Non-Linear Genetic Algorithms for Solaing Problems by Finding a Fit Compositian
of Functions. u.s. Patent 5,136,686.Fi1ed March 28,lgg}.IssuedAugust 4,1992.
Bibliography
Koza,J. R. 1992h Genetic evolution and co-evolution of game strategies. Paper presented at the
International Conference on Game Theory and Its Applications, Stony Brool New York. July
1.5,1992.
Koza,I.R.1992i. Non-Linear Genetic Algorithms for Solaing Problems. Canadian Patent 1,311,567.
Issued December 15, 1992.
Koza, J. R. 1992j. Non-Linear Genetic Algorithms for Solaing Problems. Australian Patent 611,350.
Issued September 21,, 1991.
Koza, j. R. 1993a. Simultaneous discovery of detectors and a way of using the detectors via
genetic programming. 1993 IEEE International Conference on Neural Networks, San Francisco. IEEE
1993. Volume III.
Koza, J. R. 1993b. Simultaneous discovery of reusable detectors and subroutines using genetic
programming. In Forrest, S. (editor). Proceedings of the Fifth International Conference on Genetic
Algorithms. Morgan Kaufmann.
Koza, J. R. 1993c. Discovery of a main program and reusable subroutines using genetic programming. 1993. Proceedings ofthe Fifth Workshop on Neural Networks: An International Confermce
on Computational Intelligence: Neural Networks, Fuzzy Systems, Eaolutionary Programming, and Viftual Reality. The Society for Computer Simulation.
Koza,John R. (editor) 1993d. Artificial Life at Stanford 1993. Stanford University Bookstore.
Koza,John R. (editor) 7993e. Genetic Algorithms at Stanford L993. Stanlord University
Bookstore.
Koza,f. R. 1994a. Scalable leaming in genetic programming using automatically defined functions. In Kinnear, K. E. jr. (editor). Advances in Genetic Programming. The MIT Press.
Koza,l.R. 1994b. lrtroduction to genetic programming.In Kinnear, K. E. Jr. (editor). Adaances in
Genetic Programming. The MIT Press.
Koza,]. R. 1994c. Spontaneous emergence of self-replicating and evolutionarily self-improving
computer prograrns. In Langton, C. G. (editor). 1994. Artificial Life IIl, SFI Studies in the Sciences
of Complexity. Volume XVII. Addison-Wesley.
Koza,J. R. 1994d. Recognizing pattems in protein sequences using iteration-performing calculations in genetic programming. Proceedings of the 1994IEEE World Congress on Computational
Intelligmce. IEEE Press.
Koza,j. R. 1994e. Evolution of a subsumption architecture that performs a wall following task
for an autonomous mobile robot via genetic programming. Le Petsche, T. (editor). Computational
LearningTheory andNatural Learning Systems,Volume 2. The MIT Press.
Koza,John R. 1994f . Automated discovery of detectors and iteration-performing calculations
to recognize pattems in protein sequences using genetic programming. Proceedings ofthe Conference on Computer Vision and Pattern Recognition.IEEE Computer Society Press.
Koza, |. R., and Keane, M. A. 1990a. Cart centering and broom balancing by genetica\ breeding populations of control strategy programs. In Proceedings of International loint Conference on
Neural Networks, Washington, lanuary 15-19 , 1990. Volume I, Erlbaum.
Koza,J.R., and Keane, M. A.1990b. Geneticbreedingof non-linear optimalcontrolstrategies for
broom balancing. In Proceedings ofthe Ninth International Confermce on Analysis and Optimization
of Systans, Antibes, France. Springer-Verlag.
Koza, J. R., Keane,M. A., and Rice, 1.P.1993. Performance improvement of machine learning
via automatic discovery of facilitating functions as applied to a problem of symbolic system
identification. 1993 IEEE Interrational Conference on Neural Networks, San Francisco.IEEE 1993.
Volume I.
727 Bibliography
Koza, J. R., and Rice, |. P. 1990. ANon-Linear Genetic Process for IJse with Co-Eaolaing populations.
U.S. Patent Application filed September 18, 1990.
Koza, J. R., and Rice, J. P.1991a. Genetic generation of both the weights and architecture for a
neural network. In Ptoceedings of lnternational loint Conference on Neural Networks, Seattle, JuIy
1991, volume II. IEEE Press.
Koza, J. R', and Rice, J. P.799Ib.A genetic approach to artificial intelligence. br Langton, C. G.
(editor). Artifcial Life II Video Proceedings.Addison-Wesley.
Koza, J. R., and Rice, J. P.1992a. Genetic programming: The Moaie. The MIT press.
Koza, J. R., and Rice, J. P. 1992b. ANon-Linear Genetic Process for Dttta Encoding and for Sotaing
Problems Using Automatically Defined Functions. U.S. Patent Application filed May l\,IggZ.
Koza,l. R., and Rice,I. P.1992c. Arttomatic programming of robots using genetic programming.
ln Proceedings ofTenth National Conference on Artificint lntelligence.AAAl press.
Koza, J. R., and Rice, J. P' 1992c. ANon-Linear Genetic Process for Problem Solaing Using Spontaneously Emergent Self-RElicating and Self-lmproaing Entities. U.S. Patent Application filed june 16,
7992.
Koza, J. R., and Rice, J' P. 1992d. A Non-Linear Genetic Process for llse with plural Co-Eaolaing
Populations. U'S. Patent 5,148,513. Filed September 18, 1990.Issued Sepiember 15,Ig92,
Koza, J. R., and Rice, J. P. ANon-Linear Genetic Process for Problem Sotaing Llsing Spontaneously
Emergent Self-RElicating and Self-Improz:ing Entities. U.S. Patent Application filed June 76,Ig92.
Koza, j. R', and Rice, ]. P.1994. Genetic Programming II Videotape: The Next Generation.The MIT
Press.
Koza, J. R', Rice, J. P., and Roughgarden,!.1992a. Eaolution of Food Foraging Strategies for the
Caribbean Anolis Lizard l,lsing Genetic Programming. SNftaFe Institute Working paper gZ-}G}Zg.
Koza, J. R', Rice, J. P., and Roughgarden,J. \992b. Evolution of food foraging strategies for the
caribbean Anolis lizardusing genetic programm rng. Adaptiae Behaaior. r(,2): 47-74.
=Ktaft, D. H., Petry, F. E., Buckles, W P., and Sadasivan, T. 1994. The use of genetic programming to build queries for information retriev al. Proceedings of the 7gg4 IEEE World Congress on
Computational Intelligutce. IEEE press.
Kyte, ]. and Doolittle, R. 1982. A simple method for displaying the hydropathic character of
proteins. lournnl of Molecular Biology.I5T: l0S-132.
Laird, J. E., Rosenbloom, P. S., and Newell, A. 1986a. Llniaersal Subgoaling and Chunking. Kluwer
Academic.
Laird, J' E', Rosenbloom, P. S., and Newell, A. 1986b. Chunking in Soar: The anatomy of a general leaming mechanism. Mqchine Learning,l(1) 11-46.
Langton, C' G. (editor). 1989. Artificiat Life, Santa Fe Institute Studies in the Sciences of Complexity.
\4rlume VI. Addison-Wesley.
Langton, C.G., Thylor, C., Farmer, ]. D., and Rasmussen, S. (editors).Iggl. Artificial Life II, SFI
studies in the sciences of Complexify. volume X. Addison-wesley.
Langton, C' G. (editor).1994. Artificint Life lll, SFI Studies in the Sciences of Complexity.yolume
XWI.Addison-Wesley.
Lapedes, A. Bames, C., Burks, c., Farbe4 R., and sirotkin, K.M. 1990. Application of neural
networks and other machine learning algorithms to DNA sequence analysis. L:r Bell, G. I. and
Marr, T. G. (editors) . Computers and DNA.Addison-Wesley.
728 Bibliography
=Lay, M-Y. 1994. Appltcation of genetic programming in analyzing multiple steady states of
dynamical systems. Proceedings of the 1.994IEEE World Congress on Computational lntelligence.
IEEE Press.
Le Cun, Y., Boser, B. Denker, J. S., Henderson, R. E., Howard,H.,W., and Jackel, L.D. 1990.
Handwritten digit recognition with a back-propagation network. In Touretzky, D. S. (editor)
Adaances inNeural Information Processing Systems 2. Morgan Kaufmann.
Le Grand, S. M. 1993. The Application of the genetic algorithm to protein tertiary structure prediction.
Ph.D. dissertation, Department of Chemistry, Biochemistry The Pennsylvania State University..
l,esk, A. M. (editor). 1988. Computational Molecular Biology: Sources and Methods for Sequence Analysls. Oxford University Press.
Lesk, A. M. 1991. Protein Architecture: APractical Approach. Oxford University Press.
Leszcynski, J. F. and Rose, G. D. 1986. Loops in globular proteins: Anovel category of secondary
structure. S cience. 234: 849-855. November 14, 1986.
Lim, H. A., Fickett, j. W., Cantor, C. R., and Robbins, R. j. (editors). 1993. The Secondlnternational
Conference on Bioinformntics, Supercomputing, and Complex Gmome Analysis. World Scientific'
Lucasius, C. B. and Kateman, G. 1989. Application of genetic algorithms to chemometrics' In
Schaffer, J. D. (editor). Proceedings oftheThird International Conference on Genetic Algorithms.MotganKaufmann.
Lucasius, C. 8., Blommers, M. j. f., Buydens, L. M. C., and Kateman, G.1991.In Davis, L. (edi
tor). Handbook of Genetic Algorithms. Van Nostrand Reinhold.
Maennet R. and Manderick, B. (editors). 1992. Proceedings ofthe Second International Conference
on Parallel Problem Solaing from Nature. North Holland'
Marcel,J. J., Blommers, M. J., Lucasius, C. B., Katemary G, and Kaptein, R. 1992. Conformational
analysis of a dinucleotide photodimer with the aid of the genetic algorithm. Biopolymers 32:45-
52.
=Massand, B.1994. Optimizing confidence of text classification by evolution of symbolic expressions. In Kinneal, K. E. fr. (editor). Adzsances in Genetic Programming. The MIT Press.
Matthews, 8.W.1975. Comparison of the predicted and observed secondary structure of T4
phage lysozyme. Biochemica et Biophysica Acta. 405:442-45I.
-Maxwell, S. R.,III. 1.994. Experiments with a coroutine execution model for genetic programming. Proceedings ofthe 1994IEEEWorld Congress on Computational Intelligrnce. IEEE Press.
Meyer, J. A., and Wilson, S. tN. From Animals to Animats: Proceedings of the First lnternational
Conference on Simulation of Adnptiue Behnaior. Paris. September 24-28,1990. The MIT Press 1991.
Meyer, f . A., Roitblat, H. L. and Wilson, S. W. (editors). 1993. From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptiae Beharsior. The MIT Press.
Michalewic 2,2.1992. Genetic Algorithms + Data Structures =EuolutionPrograms.Springer-Verlag.
Milleq, G. F., Todd, P.M., and Hegde, S. U. 1989. Designing Neural Networks using Genetic
Algorithms. In Schaffer, J. D. (editor). Proceedings ofthe Third International Confnutce on Cenetic
Algor ithms. Morgan Kaufmann.
Minton S. 1990. Quantitative results conceming the utility of explanation-based leaming. In
Shavlik, ]. W., and Dietterich, T. G. Readings in Machine Learning. Morgan Kaufmann.
Mitchell, T. M., Keller, R. M., and Kedar-Cabelli, S.T. 1986. Explanation-based generalization:A
unifying view. Machine Learning,l(1): 47-80.
=Montana, D. J. 1993. Strongly Typed Genetic Programming.Bolt, Beranek, and Newman technical
report 7866. May 7, L993.
729 Bibliography
Muskal, S. M., Holbrook, S R. and Kim, S. H. 1990. Prediction of the disulfide-bonding state of
cysteine in proteins. P rotein Engineering. 3(8): 667 -672.
Muskal, S' M. and Kim. S. H.1992. Predicting protein secondary structure content - a tandem
neural network approach. lournal of Molecular Biology.22s:Tlz-727.
=Nguyen, T. and Huang, T. tggL.Evolvable modeling: Structural adaptation through hierarchical evolution for 3-D model-based vision. [n Kinnea4 K. E. Jr. (editor). Adaances in Genetic programming. The MIT Press.
Nilssory N. J. 1980. Principles of Artificinr Interligence. Morgan Kaufmann.
=Nordin, P.7994. Acompiling genetic programming system that directly manipulates the machine code. In Kinneal, K. E. fr. (editor). Adaances in Genetic Programming. The MIT press.
=Oakley, E. H. N. 1994. Two scientific applications of genetic programming: Stack filters and
non-linear equation fitting to chaotic data. In Kinnear, K. E. Jr. (editor). Adaances in Genetic programming. The MIT Press.
=O'Reilly, U. M. and Oppacher, F.7992.An experimental perspective on genetic programming.
In MaenneD R' and Manderic! B. (editors). Praceedings of the Second International Conference on
P ar allel P r oblem S olu ing from N at ur e. North Holland.
=O'Reilly, U. M. and Oppacher,F' 1994. The Troubling Aspects of a Building Btock Hypothzsis for Genetic Programming. santa Fe Institute working paper 9+02-001.
Park, S' K', and Miller, K. W. 1988. Random number generators: Good ones are hard to find.
Communications af the ACM.3I: II92-I201..
Pauling, L' and Corey, R. B' 1951. Configurations of polypeptide chains with favored orientations around single bonds: Two new pleated sheets. Proceedings ofthe Nationnl Academy of Science
USA.37:729-740.
Pauling, L', Corey, R. 8., and Branson, H. R. 1951. The structure of proteins: Two hydrogenbonded helical configurations of the polypeptide chain. Proceedings of the Nationnl Academy of
S cience US A. 37 : 205-21I.
=Petkis, T' 1994. Stack-based genetic programming. Proceedings of the 79g4IEEE World Congress
on Cornputational Intelligence. IEEE press.
=PertY,I'E' 7994' The effect of population enrichment in genetic programmin g. proceedings of the 1994IEEE world Congress on Computationnl Intelligen r.lnElpress.
Platt, D' M' and Dix, T. I.1993. Construction of reskiction maps using a genetic algorithm. In Mudge T' N., Milutinovic, V., and Hunte4, L. (editors). Proceidings of*rifr*ty-S;ixth Annuat
Hawaii lnternational Conference on Systems Science 1993. The IEEE Computer society press. Volume L
Prusinkiewicz, P' and Lindenmayea A. 1990. The Atgorithmic Beauty of plants.springer-verlag.
Quinlan, f. R. Induction of decision trees. Machine Learning 1 (1): g1-106
. 19g6.
Ragavan, H' and Rendell, L.1993. Lookahead feature construction for learning hard concepts.
In Machine Learning; Proceedings of the Tenth lnternational Conference. Morgan Kaufmann.
Rawlins, G. (editor). 199r. Foundations of Genetic Algorithms. Morgan Kaufmann.
Rendell, L' and Seshu, R. 1990. Leaming hard concepts through constructive induction: Framework and rationale. Computational lntelligence. 6: 247 _270.
Reynolds,C.W.1991.Boids'InLangton,C.G.(editor). ArtificialLifellVideoproceedings.AddisonWesley.
730 Bibliography
=Reynolds, C. W. 1993. An evolved vision-based behavioral model of coordinated group motion. In Meyer,l-A., Roitblat, H. L. and Wilsory S. W. (editors). From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptioe Behaaior. The MIT Press.
=Reynolds, C. W. 1994a. Evolution of obstacle avoidance behavior: Using noise to promote
robust solutions. Irr Kinnear, K. E. Jr. (editor). Adaances in Genetic Programming. The MIT Press.
=Reynolds, C. W. 1994b. An evolved vision-based model of obstacle avoidance behavior. I:r
Langton, C. G. (editor). Artificial Life nI, SFI Studies in the Sciences of Complexify. Volume XVtr.
Addison-Wesley.
=Reynolds, C. W. 1994c. The difficulty of roving eyes.
gress on Computational Intelligence. IEEE Press.
Rich, E. L983. Artificial lntelligence. McGraw-Hill'
Proceedings of the L994IEEE World ConRichardson, D. C. and Richardson, J. 5.1992. The kinemage: A tool for scientific communicatron. Protein Science 1(1): &-9.
=Rosca,I. P. and Ballard, D.H.1994a. Learningby adpattngrepresentationsingeneticprogramming. Proceedings ofthe 1994IEEE World Congress on Computational Intelligence.IEEE Press.
=Rosca,J. P. and Ballard, D.H.lgg4-:b.GuteticProgrammingwithAdaptiaeREresentations. Department of Computer Science technical report 489, University of Rochester, Feburary 1994.
Rosenbloom, P. S., Laird, J. E. 1986. Mapping explanation-based generalization onto Soar. Proceedings of the FifthNational Conference on Artificiat lntelligence. Volume 1. Morgan Kaufmann.
Rosenbloom, P. S, Laird, J. E., and Newell, A. (editors) . 1993. The Soar Papers. The MIT Press.
Volumes I and II.
Rumelhart, D. E., HintorL G. E., and Williams, R. I. 1986. Learning intemal representations by
error propagation. In Rumelhart, D. E., McClelland,!.L, and the PDP Research Group (editors). Parallel Distributed Processing, Volume 1. The MIT Press'
=Rush, J. R., Fraser, A. P., and Bames D.P.1994. Evolving co-operation in autonomous robotic
systems. Proceeedings of the IEE lnternational Conference on Control, March 21'-24,1994. lnstitute of
Electrical Engineers. London.
=Ryan, C.I9g4. pygmies and civil servants. In Kinnear, K. E. Jr. (editor). Adaances in Genetic
Programming. The MIT Press.
Samuel, A. L.lg1g.Some studies in machine leaming using the game of checkers. lBMlournal of
Research and Deuelopmrnt, 3(3): 21'0+29. Juiy 1959.
Schaffer, ]. D. (editor). 1989. Proceedings oftheThird International Confuence on Gatetic Algorithms.
Morgan Kaufmann.
Schaffer, J. D. and \A/hitley, D (editors). 1992. Proceedings ofthe Workshop on Combinations of Genetic Algorithms and Neural Networl<s 1992.The IEEE Computer Society Press.
Schulz, G. E. and Schirmer, R. H. 1979. Principles of Protein Structure. Springer-Verlag.
Schulze-Kremer, S. 1993. Genetic algorithms for protein tertiary structure prediction. In Brazdil,
p. B. (editor). Machine Learning: European Conference onMachine Learning,Vienna, Austrin, April5-
7, 1993, Proceedings. Springer-Verlag.
Schwefel, H. P. and Maennel, R. (editors) . 7991.. Parallel Problem Solaing from Nature. SpringerVerlag.
Schirmer, T. and Cowan, S. W. 1993. Prediction of membrane-spanning b-strands and its application to maltoporin. Protein Science.2:1361-1363. August L993.
Shirai, Y. and Tsujii, 1.1952. Artificial Intelligence: Concepts,Techniques, and Applications.Wrley.
731, Bibliography
=Siegel,8.1994. Competitively evolving decision hees against fixed training cases for nafural
language processing. tr Kinnear, K. E. Jr. (editor). Adaances in Genetic programming. The MIT
Press.
=Sims, K. 1991a. Artificial evolution for Computer Graphics. Computer Graphics.2E@):319-32g.
|uly 1991.
=Sims, K' 1991b. Panspermia. tr Langton, Christopher G. (edito r). Artificiat Life II Video proceedlngs. Addison-Wesley.
=Sims, K' lgg2a. Lrteractive evolution of dynamical systems. Lr Varela, F. J., and Bourgine, p.
(editors)' Toward a Practice of Autonomous Systems: Proceedings of the First European conference on
Artificial Life.The MIT press.
=Sims, K' lggzb.Interactive evolution of equations for procedural models. proceedings of IMAGIN A conference, Mont e Carlo, lanuary 29_31, 1gg2.
=Sims, K' lg3a.Interactive evolution of equations for procedural models . The Visual Computer.
9:466-476.
=Sims, K' 1993b' Evolving Images. Lecture presented at Centre George pornpidou, paris on March 4, 1993. N otebook. Number 5.
=singleton, A. Genetic programming with c++. 1994. Byte, rg(2): 17r-176.
=Spencer, G. 1993. Automatic generation of programs for crawling and walkiflg. In Forrest, S. (editor)' ProceedingsoftheFifthlnternationalConfeienceonGeneticAlgorithms.MorganKaufmann.=spencer, G.7ggl'Automatic generation of programs for crawling and walking. In Kinneaa K. E. |r. (editor). Adaances in Genetic progrnmming. The MIT press.
=spector, L' 1994' Genetic programming and AI planning systems. proceedings of Twelfth Na- tional Conference on Artificial Intelligence.. AAAI press / fne ffrUf press.
=spector, L' and Alpern, A.rggl.Criticism, culture, and the automatic generation of artworks. Proceedings of rutlfth Nationar Confermce on Artificiat Inteiligence.AAAl press / The MIT press.
steele, G. L. Jr. 1990.Common LIS4:The l-anguage. Digital press. second Edition.
stender, J. (ediror). L993. pararlel Genetic Argorithms.Ios pubrishing.
stryea Lubert. 1988. Biochemistry.w.H. Freeman Third Edition.
sury s' 1993' Reduced representation model of protein structure prediction: Statistical potential and genetic algorithms. protein science.z(s):762-7g5. May 1993.
=Thckett' w' A' 1993a' Genetic programming for feature discovery and image discrimination. In Forrest, s' (editor)' Proceedings of the Fifth International Conference on Genetic Algorithms.Morgan Kaufmann.
=Tackett' w' A' 1993b' Genetic generation of dendritic trees for image classific atton. proceedings
of the world Conference on Neural Networks, portland, oregon, luly 1993.IEEE press.
=Tackett' w' A' 1994' Recombination, Selection, and the Genetic Construction of Computer programs.
Ph'D' dissertation, university of Southern Califomia, Department of Electrical Engineering Systems.
=Thckett' W' A' and Carmi, A.1994a. Scalability, generalizatiory and breeding schemes i. ge- netic programming: The donut problem. In Kinnear, K. E. Jr. (editor). Adaances in Genetic programming. The MIT press.
=Thckett' W' A' and Carmi, A' 1gg4b. The unique implications of brood selection for genetic programming' Proceedings of the 1994IEEE world Congress on Computational Intelligence. IEEE Press.
732 Bibliography
Tajima, K.1993. Multiple sequence alignment using parallel genetic algorithms.In Takagi, T.,
Imai, H., Miyano, S. Mitaku, S., and Kanehisa, M. (editors). Gmome Informatics Workshop IV.
Universal Academy Press.
Takagi, T., Imai, H., Miyano, S. Mitaku, S., and Kanehisa, M. (editors).1993. Genome Informatics
Workshop /V. Universal Academy Press.
Thnimoto, S. L. t987 . The Elements of Artificial lntelligence. Computer Science Press.
=Teller, A. 1993. Learning mental models. Proceedings ofthe Fifth Workshop on Neural Networks:
An Internationnl Conference on Computational Intelligutce: Neural Networks, Fuzzy Systems, Eaolutionary Programming, and Virtual Reality. The Society for Computer Simulation.
=Teller, A. 1994a. The evolution of mental models. tr Kinneaq, K. E. Jr. (editor). Aduances in
Gcnetic Programming. The MIT Press.
=Teller, A.I994b. Genetic programming, indexed memory the halting problem, and other curiosities. Proceedings ofthc Seaenth Florida Artificial Intelligence Research Symposium.
=Teller, A.1994c. Turing completeness in the language of genetic programming with indexed
memory. Proceedings ofthe 1994IEEE World Congress on Computational lntelligence. IEEE Press.
Teufel, M., Pompejus, M., Humbel, 8., Friedrich, K., and Fritz, H.I. 1,993. Properties of
bacteriorhodopsin derivatives constructedby insertion of an exogenous epitope into extra-membrane loops. The EMBO I ournal. 12(9) :3399-3408.
=Thonemann, U. W. 1992. Verbessenmg des Simulated Annealing unter Anraendung Genetischer
Programmier:ung am Beispiel des Diskreten Quadratischen Layoutprobluns. Master's thesis, University of Paderbom, Germany.
=Thonemann, U.W.1994. Finding improved simulated annealing schedules with genetic programming. Proceedings ofthe 1994LEEE World Congress on Computational Intellige:nce.lEEB Press.
(Ihr, L. and Vossler, C.1966. A pattem recognition program that generates, evaluates, and adjusts its own operators. In UhL Leonard (editor). PatternRecognition. Wiley.
Unger, R. and Moult,1.1993a. On the applicability of genetic algorithms to protein folding.In
Mudge, T. N., Milutinovic, V., and Hunter, L. (editors). Proceedings of the Twmty-Sixth Annunl
Hawaii lnternational Confermce on Systems Science 1993. The IEEE Computer Society Press. Volume I.
Unger, R. and Moult, J. 1993b. A genetic algorithm for 3D protein folding simulations. In Forrest,
S. (editor). Proceedings of theFifthlnternationalConfuence onGenetic Algoithms.MorganKaufmann.
Unger, R. and Moult, J. 1993c. Genetic algprithms for protein folding simulations. lournal of
Molecular B iolo gy. 231. : 7 5-81..
van Laarhoven, P. J. M., and Aarts, E. H. 1987. Simulated Annealing; Theory and Applications.
Reidel.
von Heijne, G.1992. Membrane protein structure prediction: Hydrophobicity analysis and the
positive-inside rule. lournnl of Molecular Biology.225: 487494.
Weiss, S. M. and lndurkhya, N. 1991. Reduced complexity rule induction. Proceedings of the
koelfth International loint Conference on Artificial Intelligence.
Weiss, S. M., Cohery D. M., and Indurkhya, N. 1993. Transmembrane segment prediction from
protein sequence data. In Hunter, L., Searls, D., and Shavlik, J. (editors). Proceedings ofthe First
Internationnl Conference on Intelligent Systems for Molecular Biology. AAAI Press.
Whitley, D (editor). 1992. Proceedings of Workshop on the Foundations of Genetic Algorithms and
Classifier Systems, Vail, Colorado 1992. Morgan Kaufmann.
/J3 Bibliography
\A/hitley, D', Starkweather, T., and Bogart, C.Igg0. Genetic algorithms and neural networks:
optimizing connections and connectivity. parallel computing. 14(3):347-361.
Wilson, S' W 1990. Percepkon redux: emergence of structure. ln Forrest, S. (editor). Emergent
Computation: Self-Otganizing, Collectiae, and Cooperatiue ComputingNetrnorks.The MIT press.
Winston, P. H. 1981. Artificinl Intelligmce.Addison_Wesley.
Winstoru P' H., Binford, T.O., Katz,B.and Lowry M. 1983. Leaming physical descriptions from
functional definitions, examples, and preceden ts. Proceedings of the National Conference on Artificinl Intelligence. William Kaufmann.
wu, C., \tvhitsory G., Mclarty, J., Ermongkonchai, A., and chang, T. c. rgg2.protein classifica_
tion artificial neural system. p rotein s cience. 1(s) :667 -677
. May 1992.
Yeagle, P. L. 1993. The Membranes of cells. second edition. Academic press.
=Zhang, B-T. and Muhlenbein, H. Synthesis of sigma-pi neural networks by the breeder genetic
programming. Proceedings of the 1994IEEE World Congress on Computational lntelligence. IEEE
Press.
zhang, X., Fetrow, j. s., Rennie,w. A,waltz, D. L., and Berg, G. 1993. Automatic derivation of
substructures yields novel strucfural building blocks in globular proteins. In Hunter, L., Searls,
D', and Shavlik, J (editors). Proceedings of the First International Conference on Intelligent Systems
for Molecular Biology.AAAI press.
Bibliography
Index
u-carbon, 432
cr-helices, 493
abstraction, 51-53, 79, 93,130, 170
accuracy/ measure 460
actin, 430
action part of if-then rule, 80
active sites, 438
acfual variables, 69, 770
actualvariables, 173
address bits, 592
adenine, 429432
ADF, 1
alpha-beta search, 54
alpha-carbon, 432
alpha-helices, 493
amino acids, 432,434
amino terminal, 432
AND, 298
amide hydrogen, 439
annealing schedule. See simulated annealing.
anti-codon, 431
argument list, 68
argument map, 60
argument trajectory, 5M
even-5-parity, evolution of architecture,
5M,547,555
even-5-parity, evolution of primitives and
sufficiency, 586
arithmetic-performing version, omega loop,
502-504
arithmetic-performin g version,
transmembrane, 48&492
aity, 109,157
Arizona Token Exchange, 706
array method, 265, 266, 657, 659
artificial ant, 349-364, 639-&0
assembly code, 31
assigning types to the noninvariant
points, 86
atoms, 3L
audit trail
impulse response, 335, 3M
lawnmower, 231,234
letter recognihon, 412, 474
automatically defined function, 1
average fitness of the poptiation, 64
average of the average, 1.66
averageof thebest,766
average population fitness, 28
average strucfural complexity, 98
B-strands, 493
backpropagation, 699
backbone, 432
backpropagation, 704
bacteriorhodopsin, M7
baseline, 64
best generation, 103
best-of-all, 24,\39
best-of-generatiory 24, 64
best-of-run, 24
best-so-far, 22,62
beta strands 493
binary trees, 704
biochemistry, 429-443
biomass, 298
black box problem, 58, 1L0, 308. See also
Problems
blindness to relevant variables, 65
blind random search, 54,64,159
body, 68
Boids, 701
Boolean constants, 170, 578
Boolean problems. See even-parity and
symmetry and multiplexer
bovine pancreatic trypsin inhibitoq, 434,439,
MO
BPTL Seebovine pancreatic trypsin inhibitor
brackets, 392
branch histogram, 545
even-5-parity, evolution of architecture,
549,556, 557,559
branch typing, 85-87, 400, 4Z\,526
break-even point for average strucfural
complexity, 110
break-even point for computational effort,
109,110
breeder genetic programming, 704
bumblebee, 275-299,301-305, 694*4gg
C, language 3'1, 39, 94, Tl7
C'carbon,433
C terminal, 432
C++, 31., U, 706, 7IZ, ZI7
Cacarbon, 43.2
calculus, 45-51
carboxy terminal, 432
ceiling function, 102
cellular automaton, 415
cellular encoding 90, 702-7A4
central place food-fora g1ng, 712
change of representatiory 4
six-symmetry, 127
two boxes, 94
chaperon molecule, 438
characteristics of solutioru table
even-3-parity, evolution of architecture,
570
even-4parity, evolution of architecture,
568
even-5-parity, 'LB6
even-Saarity, evolution of architecfure,
562
even-5-parity, evolution of primitives and
sufficiency, 588,590
checkers,389
chromosome physical maps, M2
chromosomes, 429
chunks. 54
classifier systems, 80-81
closure, 35
co-evolution of populations, 205
co-evolution of fitness case, 709
co-routine model, 712
cobra neurotoxin venom, 495
codon, 431
collagen, 430
column-mowtng, 2Sl
Common LISP, ZZ, 69, 455,50g, 661, 6g0
comparison table, 106
artificial ant, 364
bumblebee, 10 flowers, 289
bumblebee, 15 flowers, 287
bumblebee, 20 flowers, 285
bumblebee, 25 flowerc, 2M
even-3-pari$, M = 1.6,000, 127
even4pari$,M = 16,000 181
even-5-parity, M = 16,000, 188
even-6-parity,M = 16,000, 189
five-symmetry, 133
four sines, 141
impulse response, 346
lawnmower, 32 squares, 257
lawnmoweq, 48 squares, 258
lawnmoweq, 64 squares, 256
lawnmowe4, 80 squares, 260
lawnmower, 96 squares, 26\
rninesweeper, 386
obstacle-avoiding robot, 375
quintic polynomial, LI&IZ}
sextic polynomial, 110-118
six-symmetry, 122-Igz
subset-creating transmembrane, 4gZ
three sines, 143
three-term expression, 150
two boxes, L06
two-term expression, 153
computational effort, 99, l0g
condition part of if-then rule, g0
conformation (of protein), 496
Connection Machtne, 707
coNs. 297
constant perfurbation operation, 200
constant reuse, 1,4+-LSg
constrained syntactic structure, 3gZ, 419, 4gg
constraint satisfaction, 54
context-preserving crossove4 712
control engineering, 307
convolution, 308
correlation, 37, 419, 425, 459462, 462
cosine, 4'J.9,425,462
creation of initial populatiory &,265,527_
532
crisscrosser, 253
cross-validate, 420
crossove4 25,40,65,
branch t;u1r,ng, B5-A7, 400, 479, 526
like-branch tping, 97, 4Cf., 479
point typin g, 86, 532-539
crossover fragments, 26, 42
crossover point, 26, 42
crystallography, MJ,
cumulative probability of success, 100
736 Index
cyclic graph, 168
cystic fibrosis transmembrane conductance
regulator, 446
cytosine, 429432
data bits, 592
decision tree, 391", 392, 418, 699, 706
decisiontrees, 709
decomposition figure
calculus, 47,49
general, 3
six-symmetry problem, 126
two-boxes, 92
default value, 61
DEFUN, 68,69,70
demes, 700
denomination, 417
deoxyribonucleic acid, 429430
depth-first search, 54
deterministic finite automata. 704
diploid, 429
directed aryclic Saph, 1,68, L82,717
disambiguahon, 7W
disjunctive normal form, 175
dishibution of architectures table
even-5-parity, evolution of architecfure,
primitive, 613
even-5-parity, evolution of primitives and
sufficiency, 581
even-5-parity, evolution of architecture,
540
disulfidebond, 435,M0
divide and conquer, 2
DNA. 429430
Do macro, 680
don't care, 28, 80
Doppler rheometer, 698
double auctioru 706
DowJones, 707
dummy variables, 68
dynamic memory allocatton, 297
dynamical system, 705
efficiency-ratio scaling graph
bumblebee, 302
lawnmower,302
pafity,302
eight-puzzle, 55
electrically charged, M7, 5\2
electronic matl, 717
energy minimum, 438
entropy, 37
enzyme, 430
ephemeral random constants. See random
constants.
EQ,418
equivalence, 726,'1.66
BQV,726,1,66
error as fitress measure 58,61
error rate measure, 460,486
Escherichia coli, 429, 431.
Euclidean distance, 37
even-parity, 1.57-199, 20U223, 301-305,
621_--628
even-2-parity, L66, 1.69
even-3-parity, 15V1.62, 17 5-17 8
even4-pari ty, L62, 1 78-1 80
even-3-pari ty, 158-162, 17 5-77 8
even-4-parity, evolution of architecfure, 572
even-4-parity, evolution of architecture,
primitive, 611.,612
even-4-parity, evolution of closure, 603,607
even-S-pari ty, 1.62-L64, 180-188, 20+-223
even-5-parity, evolution of architecture, 540
even-S-parity evolution of architecture,
primitive, 612,617
even-5-parity, evolution of closure, 607
even-5-parity, evolution of primitives and
sufficiency, 580, 592
even-5-parity, evolution of terminals, 598,
599
even-5-parity, single primitive, 595
even-6pariqt, 1.64-1.66, 78U189
even-7-pari ty, 19 +-I95,
even-8-parity, 195-L9 6
even-9-parity, 196
even-10-parlty, 197
even-1 1" -par irty, 197 -199
even-k-parity function, 157
evolution of architecturc, 525-617
evolution of architecture, primitive
functions, sufficiency, 611-617
evolution of closure, 601-609
evolution of evolution of primitives and
sufficiency, 575-595
evolution of terminals, 575-595
exclusive-oq, 166
exons,431
explanation-based generalization, 55
Explorer computer, 204, 310, 662, 663, 673
exponential regression, 192, 267
EXPP, 310
false-negative, 395
false-positive, 396
737 lrdex
feature, 76
file transfer protocol repository 661,717-77g
fitness cases, 36
fitness curves, 65
artificial ant, with ADFs, 357
impulse response, withADFs, 327
lawnmowe r, 64 squares, without ADFs,
^aa
LJl
letter recognition, withADFs, 404
omega-loop, arithmetic-performing
version, 502
subset-creating omega-loop, 501
subset-creating hansm embrane, 477
transmembrane, lookahead version, 519
two boxes, withoutADFs, 66
fitr:ress landscape, 716
fitness measure, 21,,36
fitness-branch trajectory, 545
even-4-parity, evolution of architecfure,
564
even-5-parity, evolution of architecfu re,
545,547,551,551 557
even-5-parity, evolution of architecture
primitive, 6L6
even-5-parity, evolution of primitives and
sufficiency, 585
fitness-case figure
artificial ant, 353
bumblebee, 276
four sine, 135
letter recognitiory 391, 393, 394, g9S, g96,
3g7,3gg
obstacle-avoiding rcbot, 366
quintic polynomial, 118
sextic polynomial, 111
three sine, 142
fitness-case table
even-3-pariry 158
in-sample omega loop, 497
in-sample transmembtNre, 464
out-of-sample omega loop, 498
out-of-sample transmemb rane, 467
two boxes, 58
five major preparatory steps, 35,60,617
five-symmefry, 132-134
FLET, 70
floating-point random constant, 1L1., 275,
310
flowchart
genetic algorithm, 2'1,-31., 4243
genetic progamming, introduction,
3H2,4243
flushes, 417-427
FOIL,713
folding problem (proteins), 4gg, M0,4g3
fonts, 699
fonts, table ol 653
forcing functioru 308
formal parameters, 68
FORTH, 32
FORTRAN , 3L, 33,34, 59,70
four sine, 134-1,M
four-of-a-kin d, 417 427
frequently asked questions, 717
FTP reposit ory, 667, 777-718
fully defined, 22,36,61
function, 70
function set, 35
function-definingbranch, 69
gain element, 308
garbage collection, 297
generalizati on, 93, 170
generalization, 50, 52, 420
generate and test, 54
generation, 21
generational approach, 700
genetic algorithm, 21, 31., 704
molecular biology, M2-443
genetic art, 707
genetic code, 431,431
genetic diversity, 26
genome, 429
genoffie,703
geometric interpretation
crossover,28,29
mutation, 30,3L
global variable, 3A7
goal, 54
GP-list mailing list, 717
granularity, 111
Group Method of Data Handling, 205
guanine, 429432
Halobacterium salinarium, M7
Hamming distance, 31, 123, 159
harmonics, 134-1,44
helix, 434
heme group, 442
hemoglobin, 442
herding behaviol, 701
hierarchical automatically defined functions,
1.67,168,170
hierarchical decompositton, 47, 92, L3O, 1.69
738 Index
hierarchical problem-solving process, 2,
45-56
hierarchies of dependencies, 531
hillclimbing, 54
histogram of generation 0
artificial ant, 640
bumblebee, 15 flowers, 635
bumblebee, 25 flowers, 637
even-3-parity, 623
even-4parity, 625
even-6-parity, 628
lawnmowel, 32 squares, 631
lawnmower, 48 squares, 631
lawnmower, 80 squares, 633
lawnmowel, 96 squares, 634
minesweeper, 639
obstacle-avoiding-robo t, 638
hits, 6L
hits criterion, 6L, 66, \12,136,145
hits histogram, 235
artificial ant with ADFs, 358
impulse response withADFs, 334
lawnmower, 64 squares, withADFs, 249
lawnmowe4 64 sqtares, without ADFs,
236
letter recognition, with ADFs, 404
Holland, |ohn, 2L
HOMING, 392
Hopp-Woods hydrophobicity scale, M7
human genome, 429
Human Genome Prgect, 441
hydrophilic ity, M7, 449
hydrophobic\ty, M6-M9
hyperplane, 28
identical reuse, 48,52
r F, 33, 391-392, 418, 577, 592, 593
if-part of if-then rule, 80
if-then rule, 54
TFGTZ, 489
IFLTE, 3].0,458
impasse, 54
implicit fitness 617
impulse response/ 307-347
in-line function, 70
in-sample correlation, 470
in-sample fitness cases, 418,463
incest, 26
independent agent,702
index, 455
indexed memory/ 715,716
individuals that must be processed, 103
inequality, 166
infix notatioru 33
information refrev al, 7 07
initial conditions, 37
initial random generation, 64, 265, 527-532
instantaneous probability of success, 100
instantiation, 50
integer vector random constants, 227
inlenctle,707
interactive fitness, 700, 707
Internet, FTP repository, 667, 717-718
introduction
to biochemistry, 429443
to genetic algorithms, 21'-gL
to genetic programming, 3542
to LISP, 31'-U
to molecular biology, 429443
to transmembrane proteins, M6-452
introns, 298,43L
invaianl, T4
invocations of ADFs table
four sines, 138
three-term expression, 149
inverse turn, 495
iteration variable, 455
iteration-performing br anch, 456, 472,507
iteration-terminatingbranch, 507
iteration, 454456
jetliners, 700
John Muir trail, 349
jumping column mower, 254
Kabsch and Sander dictionary 495
Kalman filter, 71'4
kilobase, 430
kinemage, 499
Kyte-Doolittle hydrophobicity, M7, M9
LABELS, 70
lawnmowe r, 225173, 301-305, 62&-6,34
least-squares regressiory 190
lens effect, 619441, 701'
lens effect tables
bumblebee, 638
even-pafity, 629
lawnmower, 635
lesions, 715
LET,70,'l-45
letter recognition, 389-416
Levenberg-Marquardt regression, 705
739 Index
like-branch Wng, BT, 400, 4Tg
Lindenmayer system, 392
linear regressiory SB, Ig1, 2&, 266, 290
linear time-invariant system, 30g
LISP, 31-34 59, 69,455, 509, 65r, 690
list, 3L
locus name, 449
Iogarifhmic scale, 65, 192, 266, 622
lookahead transmembrane, 505-524
lookup tabIe, L74
LOOP macro, 455, 508
looping-over-residues, S0T,S0?
Los Altos Hills trail. 349
Mackey-Glass equation s, 698, 705
macro,'351
DO, 680
rF,397192
IF_FOOD-AHEAD, 351
IFGTZ, 489
rFLrE,45g,512
IF-MINE, 377
IF OBSTACLE,366
LOOP,455,50g
macrops, 55
search 55
macrops, 55
mailing list, onJine, 777-718
main chain (of protein), 432
main point one 6, 95,649
main point two 6, 96-97, 643
main point three, 6,219,541.,649
main point fotar, 6,221,222,6M
mainpoint hve7,264,&4
main point six, 7,268, 6M
main point seven 7, j04,644
main point eight 7, 5Zg, 646
major parameters, 6L
major preparatory steps, 35,60,617
Mathematica, 34
MDL. See minimum description length
meary 64
mean and standard deviations table
bumblebee, 638
even-parity, 629
lawnmower 635
means-end analysis, 54
median, 64
memory 512,715,716
memory fragmentati on, 297
mental model, 715,71,6
messenger RNA, 430
Metropolis algorithm. See simulated
annealing.
minesweep er, 377 i87, 639
minimum description length, ZZ2, M2, 7I1,
Minkowski distance, 37
minor parameters, 62
molecular biology, 429-qg
monitoring strategy, 702
motifs table
even-5-parity, 187
lawnmower, 255
mouse peripheral myelinprotein ZZ, M9
mRNA" 430
multiobjective fitness measure, 37
multiple function-defining branches, 1 66,
767
multiplexer, 592-594, 59 +S9S
mutation, 26,28,30
mutation point, 26
myoglobin, 430,442
myosin, 430
N terminal, 432
n-sample correlation, 419
name, 68
NAND, 577
native strucfure, 436
neural nefworks, 76-90, 297, 702-7M
NMR, 441
noise, 701
noise signal, 317,328
noise variable, 598, 611
noninvariantpoint, 74
NOT,577
nuclear magnetic resonance, 441
nucleic acids, 429
nucleotide bases, 429
numbering scheme for identifying Boolean
functions, 158
numerically-valued disjunctive function,
459,5I2
numerically-valued logic, 457
obstacle-avoiding behavi or, T0'1,
obstacle-avoiding robot, g65-376, 637, Zl2
Occam's tazor, 704
odd-2-paity, 1.66
odd-3-parity, 1.66, 169
offspring, 26,42
omega loop, 435,49T504
on-line mailing list, 717-718
740 hrdex
optical character recognition, 699
optimal control, 37
oR,298,602
numerically-valued, 458, 512
order, 157
oRN,458,512
out-of-sample correlation, 420, 463, 470
out-of-sample fitness cases, 420, 463
output interface, 38, 61
overfitting, 480, 48'1,, 492, 501
overfitting graph
arithmetic-performing transmembrane,
491
omega loop, arithmetic-performing
version 504
subset-creating omega -loop, 502
subset-creating transmembrane, 482
transmembrane, lookahead version, 521,
523
overflow, 67,310
overprediction, 459
panmictic breeding,
Panspermia, T0T
parametrized reuse,
parents, 25,40
700
49-51,52,130,170
parity problems. See even-ParitY
parity rule, 179,185
Park-Miller randomizer, 320, 696
parse tree, 31.. See also program tree'
parsimony, 704
parsing, 506,510
PASCAL, 3']..,33,34,70
pattem recognition
letters,389-416
omega loops 493-504
transmembrane domain 445492, 504-524
pinochle hands, 417427
PDB. See protein data bank
peptide bond, 433
percentage of agreement measure, 460
performance cuwes, 102
artificial ant, withADFs, 363
artificial ant, without ADFs, 355
bumblebee, withADFs, L0 flowers, 289
bumblebee, withADFs, 15 flowers, 287
bumblebee, withADFs, 20 flowers, 285
bumblebee, withADFs, 25 flowers, 283
bumblebee, without ADFs, 10 flowers,
289
bumblebee, without ADFs, 15 flowers,
287
bumblebee, without ADFs, 20 flowers,
285
bumblebee, without ADFs, 25 flowers,
280
even-3-parity, evolution of architecture,
56/
even-3-parity, with ADFs, M = L6,000, 177
even-3-parity, without ADFs, M = 16,000,
161,
even4parity, evolution of architecture,
565
even-4pari$ map {3,3i, M = 4,000, 566
even-4-pari9, map {3l1, M = 4,000, 566
even-4-parity, with ADFs, M =L6,000, 181
even-4-parity, without ADFs, M = 1.6,004,
1,63
even-4-parity, without ADFs, M = 4,00O,
567
even-5-parity, evolution of architecture,
541,
even-5-parity, evolution of architectute,
primitive, 615
even-5-parity, evolution of ciosure, 608
even-5-parity, evolution of primitives and
sufficiency, 582
even-5-pari$, map [3, 31, M = 4,00O,
computer code, 674
even-5-parity, withADFI, M =16,000, 187
even-5-parity, with ADFs, M = 4,0O0, maP
12,2,2, 212,21.4
even-5-parity, with ADFs, M = 4,000,maP
12,2,2]l,2',1,0
even-5-parity, withADFs, M = 4,000,maP
12,2L 208
even-5-parity, withADFI, M = 4,000, map
{2,3lt,2L8
even-5-parity, withADFs, M = 4,000,r:":raP
{2},206
even-5-parity, withADFs, M = 4,000, map
{3,3,3,2L3,2r4
even-5-parity, with ADFs, M = 4,000,maP
{3,3,3},211
even-5-parity, with ADFs, M = 4,000, maP
{3,3]l, 209
even-5-parity, withADFs, M = 4,000, map
l3l, 207
even-5-parity, withADFI, M = 4,000, map
114,4,4,273,215
even-5-parity, withADFs, M = 4,000, map
{4,4,41, 211,
even-5-parity, withADFs, M = 4,000, map
14,4ll,2W
even-5-parity, with ADFs, M = 4,000, maP
l4l,207
741
even-5-parity, without ADFs, M = 16,000,
764
even-Fparity, without ADFs, M = 4,000,
205
even-6-parity, with ADFs, M = 16,000, 189
even-7-parity, withADFs, M = 4,000, 196
five-symmetry withADFs, 133
five-symmetry, without ADFs, 133
four sines, with ADFS, 140
four sines, withoutADFs, 136
impulse response, withADFs, 346
impulse response, without ADFs, 322
lawnmoweq, 32 squares, withoutADFs,
241
lawnmowe4, 48 squares, withADFs, 25g
Iawnmowel, 48 squares, without ADFs,
243
lawnmower, 64 squares, withADFs, 256
lawnmowe r, 64 squares, without ADFs,
241
lawnmowel, 80 squares, without ADFs,
243
lawnmoweq, g6 squares, with ADFs, 26I
lawnmowe r, 96 squares, without ADFs,
2M
minesweeper, withADFs, 386
minesweepeq, without ADFs, 3g1
obstacie-avoiding robot, with ADFs, 325
obstacle-avoiding robot, without ADFs,
371
quintic polynomial, with ADFs, 121
quintic pol;momial, without ADFs, 120
sextic polynomial, withADFs, 112
sextic polynomial, without ADFs, 113
six-symmetry, without ADFs, 125
subset-creating transm embr ane, 4g7
three sines withADFs, 143
three sines, without ADFs, 143
three-term expression, with ADFs, 149
two boxes, with ADFs, 104
two boxes, without ADFs, 102
two-term expression, without ADFs, 151
PF,576
phenotype,703
pi, constant, 14+-153
pinochle, 417427
planning, 702
plant, (system) 308
point of insertion, 532
point typin g, 86, 526, 532
points, 34,65
polypeptides, 433
poor man's iteratioru 454456
population, 21
post-translational modifications, 438
postfix notation, 32
posprocessing, 38
prefix notatiory 32
PREKIN software, 499
premature convergence, 26
preprocessing, 38
primary parameters, 36
primary structure, 434, 493
problem-solving, hierarchical, 2, 45-56
problem space, 54
Problems
artificial ant, 349164, 6g9-640
Boolean. See even-parity and symmetry
and mulfiplexer
bumblebee, 27 5-299, 301-305, 694-6gg
even-parity, 157 -199, 20UZ2J, 301-305,
621,-628
even-3-pari ty, 158-1.62, 1T S-tT B
even-4-pari ty, 162, 17 B-IB0
even-3-pari ty, 158-162, 17 S-I7 8
even-4-parity, evolution of architecfure,
572
even-4-parity, evolution of architecture,
primitive, 611,612
even-4parity, evolution of closure, 6Ag,
607
even-S-pari ty, 162-L64. 1 80-188, 2A+229
even-5-parity, evolution of architecfure,
540
even-5-parity, evolution of architecture,
primitive, 612,617
even-5-parity, evolution of closure, 607
even-5-parity, evolution of primitives and
sufficiency, 580,592
even-5-parity, evolution of terminals, 59g,
599
even-5-parity, single primitive, 595
even-6-parity, L6+-1.66, 188-189
even-7 -p arity, 19L1.95,
even-8-pari ty, 195-196
even-9-parity, 196
even-1-0-paity, 197
even- 1 l-pari ty, 1,97 -199
evolution of architecture, E2S-617
evolution of architecture, primitive
functions, sufficiency, 611,-617
evolution of closure, 60j,-609
evolution of evolution of primitives and
sufficiency, 575-595
evolution of terminals, 575-595
five-symme$ 1.32-1,34
flushes, 417427
four sine, 13+-1.44
742 brdex
four-of-a-kind s, 417 -427
impulse response, 307-347
lawnmowe r, 225-27 3, 301-305, 62M34
lens effect, 619-641,701'
letter recognitioru 389-416
lookahead transmembrane, 505-524
minesweep er, 377 -387, 639
multiplexer, 592-594, 59+-595
obstacle-avoiding robot, 365-37 6, 637, 712
omega loops, 49T504
parity. See even ParitY
quintic pollmomial, 11U122
sextic polyromial, 11U122
six-multiplexer, 592-594, 594-595
six-symmetry, L22-1U
three sine, I42-1,M
three-term expression, 1'M-I53
transmemb r ane, 445492, 505-524
two boxes, 57-L07
two-term expression, 151-153
procedure, 70
PROGN, 69
program ffee
even-3-parity, with ADFs, 175
even-4-parity, simplified result-producing
branch, I79
even-4-parity, with ADFs, 178
illustrative, 34,41
illustrative program/ 535, 539
illustrative Program, six-multiplexet, 593
illustrative program, with primitivedefining branch, 581
NAND, primitive-definingbranch, 580
sextic polynomial, withADFs, 115
three-term expression, with ADFs, 148
two boxes, with ADFs, 91
twoboxes without ADFs, 67
projectiorg 185
protected division function, 60, 310
protected exponential functioru 310
Protein Data Bank, 439
protein folding problem, 439, M0,493
proteins, 430
proteins, primary structure, 434
proteins, quarternary structure, 442
proteins, secondary strucfure, 4M
proteins, tertiary structure, 436, 442
proteins, roles, 430
public repository, 717
quadratic assignment problem, 709
quadratic regressiory 58
quatemary strucfure, 442
queries,707
quintic polynomial, 118-122
ramp input, 317,328
randomconstants, L11
Boolean170,578
bigger floating-Point, 310
floating-point, 111
floating-point vector, 275
integer vectors, 227
ternary 603
random number generators, 705
raw fitress, 61
rebooting, 298
recursive application, 52, L70
reduced representation, M2
remainder, 26,42
representation scheme, 22
reproduction, 24,65
reselection, 65
restricted iteration, 454455
restriction maps, M2
result designation, 22, 36
result-producing branch, 69, 456
retinal rod cells, 446
return, 69
reuse with modification, 52
rhodopsin, 430,M6
ribosome, 430
robot, 701
rubber-bandng, 392
S-expressions, 31
Samuel, Arthur, 1,389
San Mateo traiI, M9,350
Santa Fe hstitute, 43,706
Santa Fe traiI, 349
satisfactory result, 62,98
saving computer time
Boolean problems, 175, 571.
compilatiory I75,671
disjunctive normal form, 175, 67'1.
fitness measure, 67\
lookup table, I75,671.
three-term expression, L47
scaling
bumblebee, 290
lawnmower,268
parlty, 190,194
problem size, 30L,305
743 Index
scalin& by arity, I2g,7ST
scaling, by frequency of use of a constant,
1.45
scaling, by lawn size, 226
scaling, by number of flowerc, 275
scaling, by number of harmonics, 135
scaling, by number of repeated roots, L10
scaling, by number of squaring operations,
110
scaling, by order of polynomial, 110
schema,27
schema fibress, 28
search, 54-55
secondary parameters, 36
secondary strucfure, 434, 4gg
serial decompositioru 55
set. 453
settable variable, 453,715
settable variables, 512,715
setling functions, 512
Sewell Wright, 71,5
sextic polynomial, 110-122
short unit-square i^p.tt, 317,328
short-circuil, 298
side chain, 432,433
side effects, 70
sigma-pi, 704
simulated annealing, 27, 297, 704, T0Z-T09
sines, 134-1M
sinusoidal, 13+-1M
six-multiplexer, evolution of primitives and
sufficiency, 592
six-multiplexe4, single primitive, 595
six-symmetry, 122-LU
sixth major step, 82,201,525
slope, 264,290
Small Talk, g4
soAI{ 53-s6
solution, 62,98
sorting, 709
special functions table, 651
special symbols table, 647
square irp.rt, 317,327
standardized fitness, 61
state, 512,71.5,716
steady state, 700
step input, 328
stochastic mottfs, M2
straddling break-even for computational
effort, 109-155
STRIPS, 55
strongly typed genetic programming, 714
strucfural complexity curves, 92
artificial ant, withADFs, 359
lawnmower, 64 squares, withADFs, 251
lawnmowe4, 64 squares, withoutADFs,
237
two boxes with ADFs, 91
strucfural complexity ratto, g
structural-complexity-ratio scaling graph
bumblebee 291
lawnmower, 303
parity, 303
structure-preserving crossover
branch Vping, 8ffi7, 400, 47g, 526
like-branch Vping, 87, 4ffi, 4Tg
point typin g, 86, 532-539
subroutine, 67-72
subset-creating versiory omega loop, 500-
502
subset-creating version, transmembrane
471488
subunlt, M2
success predicate, 22
successful run, 98
sufficiency, 22,35,597
sufficienry requirement, 62
suit, 417
surunary graphs, 106
artificial Nft, 364
bumblebee, 10 flowers, 290
bumbiebee, 15 flowers, 288
bumblebee, 20 flow erc, 286
bumblebee, 25 flowerc, 284
even-3-pari$,M = 1,6,000, ln
even-4pari$, M = 16,000, 181
even-5-parity, M = 16,000, 188
five-symme@, 134
four sines, 141
lawnmowe4, 32 squares, 258
lawnmower, 48 squares, 258
lawnmoweq, 64 squares, 256
lawnmower, 80 squares, 260
lawnmower, 96 squares, 26L
minesweeper,3ST
quintic polynomial, 121
sextic polynomial, 118
six-symmetry, 131
three sines, 144
three-term expression, 150
two boxes, L06
two-term expressiory 153
summary table
bumblebee, 291
even-parity, L90
lawnmower.262
7M Index
straddling break-even for computational
effort, 1W-755
three control problems, 387
SWAP-I induction, 486
Swedish words, 709
swirler, 252
SWISgPROT, 441,462
symbolic expressions, 3L,
symbolic regression, 58, 110, 308. See also
Problems
symbolic system identification, 58, 110,308.
See also Problems
symmetry functiory 122-134
system identification, 58, 11.0, 308. See also
Problems
tableau withADFs
arithmetic-performing transmembrane,
489
artificial ant, 356
bumblebee, 281
even-3-parity, L74
even-3-parity, evolution of architecture,
531
fLush, 424
four sines, 137
impulse response, 323
lawnmower, 64 squares, 246
letter recognitioru 402
lookahead transmembrane, 514
obstacle-avoiding robot, 373
p arttal, ari thmetic-perf orming transmembrane, 489
sextic polynomial, 114
six-symmetry, 128
subset-creating transmembrane, 475
three-term expression, 147
two boxes, 84
tableau without ADFs 62
artificial ant, 354
bumblebee,277
even-3-pari4t, 160
even-5-parify, evolution of architecture,
530
flush, 421
four sines, 136
impulse response, 312
lawnmower, 64 squarcs, 229
letter recognihon, 399
obstacle-avoiding robot, 368
sextic pollmomial, 112
six-symmetry, 124
three-term expression, 145
transmembrane, 470
temperature. See simulated annealing.
terminal set, 35
termination criterion, 22, 36
ternary random constants 503
tertiary structure, 436
the best, 64
then-part of if-then rule, 80
three sine, 142-1,44
three-term expression, 1,&-1,53
three-way classification, 425
thlrmine, 429432
Tierra, 7L3.
time constanf 308
time-delay element, 308
time-out, 298,454,712
trajectory
artificial ant withADFs, 359, 362,363
bumblebee with ADFs, 282
bumblebee without ADFs, 279
lawnmower, 64 squares, with ADFs, 247,
251,252,253,255
lawnmowel, 64 squares, withoutADF+
231.,233,239,239
letter recognition with ADFs, 406
minesweeper with ADFs, 384, 385
minesweeper without ADFs, 379,380
obstacle-avoiding robot without ADFs,
370,371.
transmembrane proteins MH92, 5M-524
ari thmetic-performing, 48H92
introduction ta, M6-452
lookahead version, 504-524
subset-creating 471-488
transcription, 430
transcription and translation, 432
transfer functioru 307
transfer RNA, 431
translation, 430
translation invariance, 415, 41.6
transmembrNre, 445492, 505-524
transmembrane domain, 446
transmembrane protein, 446
IRNA, 431
true-negative, 396
true-positive, 396
truth table, 158,576
even-2-parity, 615
even-3-parity, 158
1F, 578
IF with negated arguments, 594
NAND, 578
NOT. 578
745 Index
OR with undefined arguments, 603
unnamed pF, 604, 605, 606, 607, 60g, 619
Trti^g complete, 216
turtle, 390
two boxes, 5T-107
two-term expression" 151_153
t7pes, 74
:IINDEFTNED, 601,-402
underflow, 61,310
underprediction, 459
unification, 54
universal subgoaling, 54
unit-step input, 317
uracil, 430
value-refuming branch, 69
VALUES, 69,75
vaiety, 67
variety cuwe, 67
two boxes withoutADFs, 6g
vector random constants, ZZ7, 275
vertical intercept, 264,290
virtual reality, 707
wallclock ratio table
bumblebee, 293
lawnmower, ZTS
wallclock $ne, 268-279, Zgg-Zgg, S4Z
wallclock ratio scaling graph
bumblebee, 305
lawnmower, 305
weak methods, 54
weight sharing, B0
worst-of-generation, 64
wrapper/ 38, 61, 4IS, 459
wrong-positle, 397
xoR, 166
746 I.rdex